Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
This is a patch to prevent scheduler from scheduling compare and branch away, in order to increase macro-fusion opportunity on recent x86 platforms. It is motivated by the following small testcase. double __attribute__ ((noinline)) bar (double sum); int a[100]; double bar (double sum) { int i; for (i = 0; i < 100; i++) sum += (0.5 + (a[i%100] - 128)); return sum; } int main() { double total; int i; for (i = 0; i < 1000; i++) total += bar (i); return total != 0.333; } ~/workarea/gcc-r201963/build/install/bin/gcc -O2 -mtune=corei7-avx 1.c -o 1.out The binary of the kernel loop in func bar () is: 401180: 89 c8 mov%ecx,%eax 401182: 66 0f 57 c9 xorpd %xmm1,%xmm1 401186: f7 ee imul %esi 401188: 89 c8 mov%ecx,%eax 40118a: c1 f8 1fsar$0x1f,%eax 40118d: c1 fa 05sar$0x5,%edx 401190: 29 c2 sub%eax,%edx 401192: b8 64 00 00 00 mov$0x64,%eax 401197: 0f af d0imul %eax,%edx 40119a: 89 c8 mov%ecx,%eax 40119c: 83 c1 01add$0x1,%ecx 40119f: 29 d0 sub%edx,%eax 4011a1: 48 98 cltq 4011a3: 8b 04 85 60 51 6c 00mov0x6c5160(,%rax,4),%eax 4011aa: 83 c0 80add$0xff80,%eax 4011ad: 81 f9 40 42 0f 00 cmp$0xf4240,%ecx 4011b3: f2 0f 2a c8 cvtsi2sd %eax,%xmm1 4011b7: f2 0f 58 ca addsd %xmm2,%xmm1 4011bb: f2 0f 58 c1 addsd %xmm1,%xmm0 4011bf: 75 bf jne401180 Here cmp (addr: 4011ad) and jne (addr: 4011bf) are not consecutive in object code, but they are consecutive before sched2 pass. If we manually keep the cmp and jne together, the performance of 1.out changes from 2.40s to 2.31s on a sandybridge machine. Perf stat result shows that UOPS_RETIRED.MACRO_FUSED event increases from 131,075 to 1,000,130,308, and UOPS_RETIRED.ANY event decreases from 23,002,543,637 to 22,002,511,525. The patch is to reschedule cmp and jmp to make them consecutive. It is done at the end of scheduling each block before schedule result is commited. bootstrapped and regression ok on x86_64-linux-gnu. ok for trunk? 2013-09-03 Wei Mi * haifa-sched.c (move_insns): New function. (adjust_for_macro_fusion): Ditto. (schedule_block): Call adjust_for_macro_fusion before commit schedule. * doc/tm.texi.in: Generated. * doc/tm.texi: Ditto. * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for X86_TUNE_FUSE_CMP_AND_BRANCH. * config/i386/i386.c (ix86_macro_fusion_p): New function. (ix86_macro_fusion_pair_p): Ditto. * target.def: Add macro_fusion_p and macro_fusion_pair_p in sched group. Index: haifa-sched.c === --- haifa-sched.c (revision 201963) +++ haifa-sched.c (working copy) @@ -5605,6 +5605,56 @@ choose_ready (struct ready_list *ready, } } +/* Move insn scheduled_insns[I] to the position J in scheduled_insns. */ + +static void +move_insns (int i, int j) +{ + rtx insn = scheduled_insns[i]; + scheduled_insns.ordered_remove (i); + scheduled_insns.safe_insert (j, insn); +} + +/* If the last cond jump and the cond register setting insn are consecutive + before scheduling, and are scheduled away from each other, this func + tries to rearrange insns in scheduled_insns and keep those two insns + together. This is good for performance on microarchitectures supporting + macro-fusion. */ + +static void +adjust_for_macro_fusion () +{ + int i = -1, length; + unsigned int condreg1, condreg2; + rtx cc_reg_1; + rtx insn; + rtx last = scheduled_insns.last(); + + targetm.fixed_condition_code_regs (&condreg1, &condreg2); + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); + length = scheduled_insns.length (); + if (any_condjump_p (last) && reg_referenced_p (cc_reg_1, PATTERN (last))) +{ + for (i = length - 2; i >= 0; i--) + { + insn = scheduled_insns[i]; + if (modified_in_p (cc_reg_1, insn)) +break; + } +} + if (i < 0 || i == length - 2) +return; + + if (NEXT_INSN (insn) != last) +return; + + if (!targetm.sched.macro_fusion_pair_p + || !targetm.sched.macro_fusion_pair_p (insn, last)) +return; + + move_insns (i, length - 2); +} + /* This function is called when we have successfully scheduled a block. It uses the schedule stored in the scheduled_insns vector to rearrange the RTL. PREV_HEAD is used as the anchor to which we @@ -6421,6 +6471,9 @@ schedule_block (basic_block *target_bb, if (success) { + if (
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Thanks for the suggestions! I take a look at adjust_priority, and find it may not guarantee to schedule cmp and jmp together. The priority is used to choose a candidate from ready list. If cmp is the only insn in ready list and there is another insn-A in queued set (insn-A's dependence has been resolved, but it is not ready because of data delay or resource delay), then cmp will be scheduled before insn-A no matter what their priorities are. I will take a look at whether SCHED_GROUP is going to work. On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov wrote: > On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher wrote: >> >> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote: >> > Hello, >> > >> > Could you use the existing facilities instead, such as adjust_priority >> > hook, >> > or making the compare-branch insn sequence a SCHED_GROUP? >> >> >> Or a define_bypass? > > Hm, I don't think define_bypass would work: it still leaves the > scheduler freedom to move the compare up. > > IMO adjust_priority would be preferable if it allows to achieve the goal. > > Alexander
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
SCHED_GROUP works after I add chain_to_prev_insn after add_branch_dependences, in order to chain control dependences to prev insn for sched group. Here is the new patch. Testing is going on. Thanks, Wei Mi. 2013-09-06 Wei Mi * config/i386/i386.c (ix86_macro_fusion_p): New function. (ix86_macro_fusion_pair_p): Ditto. * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for X86_TUNE_FUSE_CMP_AND_BRANCH. * sched-deps.c (group_insns_for_macro_fusion): New function. (sched_analyze_insn): Call group_insns_for_macro_fusion. (chain_to_prev_insn): Change it from static to extern. (chain_to_prev_insn_p): Ditto. * doc/tm.texi: Generated. * doc/tm.texi.in: Ditto. * sched-int.h: New declarations. * sched-rgn.c (add_branch_dependences): Chain control dependences to prev insn for sched group. * target.def: Add macro_fusion_p and macro_fusion_pair_p. Index: config/i386/i386.c === --- config/i386/i386.c (revision 201963) +++ config/i386/i386.c (working copy) @@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src; + if (!strcmp (ix86_tune_string, "corei7")) +{ + /* For Nehalem. */ + rtx single_set = single_set (condgen); + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ + if (single_set == NULL_RTX) +return false; + + src = SET_SRC (single_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 + return false; + + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP) + return false; + return true; +} + else if (!strcmp (ix86_tune_string, "corei7-avx")) +{ + /* For Sandybridge. */ + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx single_set = single_set (condgen); + if (single_set != NULL_RTX) +compare_set = single_set; + else + { + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET + && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE) + compare_set = XVECEXP (pat, 0, i); + } + + if (compare_set == NULL_RTX) + return false; + src = SET_SRC (compare_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Sandybridge doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 +return false; + + /* Sandybridge doesn't support macro-fusion for inc/dec + +unsigned comparison jmp. */ + test_if = SET_SRC (pc_set (condjmp)); + cond = XEXP (test_if, 0); + ccode = GET_CODE (cond); + if (get_attr_type (condgen) == TYPE_INCDEC + && (ccode == GEU + || ccode == GTU + || ccode == LEU + || ccode == LTU)) + return false; + return true; +} + return false; +} + /* Try to reorder ready list to take advantage of Atom pipelined IMUL execution. It is applied if (1) IMUL instruction is on the top of list; @@ -42982,6 +43075,10 @@ ix86_memmodel_check (unsigned HOST_WIDE_ #undef TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD \ ia32_multipass_dfa_lookahead +#undef TARGET_SCHED_MACRO_FUSION_P +#define TARGET_SCHED_MACRO_FUSION_P ix86_macro_fusion_p +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P +#define TARGET_SCHED_MACRO_FUSION_PAIR_P ix86_macro_fusion_pair_p #undef TARGET_FUNCTION_OK_FOR_SIBCALL #define TARGET_FUNCTION_OK_FOR_SIBCALL ix86_function_ok_for_sibcall Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 201963) +++ config/i386/x86-tune.def(workin
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Add a testcase. bootstrap and regression ok for the patch in last mail. 2013-09-09 Wei Mi * gcc/testsuite/gcc.dg/macro-fusion-1.c: New. Index: gcc/testsuite/gcc.dg/macro-fusion-1.c === --- gcc/testsuite/gcc.dg/macro-fusion-1.c (revision 0) +++ gcc/testsuite/gcc.dg/macro-fusion-1.c (revision 0) @@ -0,0 +1,14 @@ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -mtune=corei7 -fdump-rtl-sched2" } */ +/* { dg-final { scan-rtl-dump-not "compare.*insn.*jump_insn.*jump_insn" "sched2" } } */ + +int a[100]; + +double bar (double sum) +{ + int i; + for (i = 0; i < 100; i++) + sum += (0.5 + (a[i%100] - 128)); + return sum; +} + On Fri, Sep 6, 2013 at 10:39 AM, Wei Mi wrote: > SCHED_GROUP works after I add chain_to_prev_insn after > add_branch_dependences, in order to chain control dependences to prev > insn for sched group. Here is the new patch. Testing is going on. > > Thanks, > Wei Mi. > > 2013-09-06 Wei Mi > > * config/i386/i386.c (ix86_macro_fusion_p): New function. > (ix86_macro_fusion_pair_p): Ditto. > * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for > X86_TUNE_FUSE_CMP_AND_BRANCH. > * sched-deps.c (group_insns_for_macro_fusion): New function. > (sched_analyze_insn): Call group_insns_for_macro_fusion. > (chain_to_prev_insn): Change it from static to extern. > (chain_to_prev_insn_p): Ditto. > * doc/tm.texi: Generated. > * doc/tm.texi.in: Ditto. > * sched-int.h: New declarations. > * sched-rgn.c (add_branch_dependences): Chain control > dependences to prev insn for sched group. > * target.def: Add macro_fusion_p and macro_fusion_pair_p. > > Index: config/i386/i386.c > === > --- config/i386/i386.c (revision 201963) > +++ config/i386/i386.c (working copy) > @@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void) > } > } > > +/* Return true if target platform supports macro-fusion. */ > + > +static bool > +ix86_macro_fusion_p () > +{ > + if (TARGET_FUSE_CMP_AND_BRANCH) > +return true; > + else > +return false; > +} > + > +/* Check whether current microarchitecture support macro fusion > + for insn pair "CONDGEN + CONDJMP". Refer to > + "Intel Architectures Optimization Reference Manual". */ > + > +static bool > +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) > +{ > + rtx src; > + if (!strcmp (ix86_tune_string, "corei7")) > +{ > + /* For Nehalem. */ > + rtx single_set = single_set (condgen); > + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ > + if (single_set == NULL_RTX) > +return false; > + > + src = SET_SRC (single_set); > + if (GET_CODE (src) != COMPARE) > + return false; > + > + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM > +insn pattern. */ > + if ((MEM_P (XEXP (src, 0)) > + && CONST_INT_P (XEXP (src, 1))) > + || (MEM_P (XEXP (src, 1)) > + && CONST_INT_P (XEXP (src, 0 > + return false; > + > + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ > + if (get_attr_type (condgen) != TYPE_TEST > + && get_attr_type (condgen) != TYPE_ICMP) > + return false; > + return true; > +} > + else if (!strcmp (ix86_tune_string, "corei7-avx")) > +{ > + /* For Sandybridge. */ > + enum rtx_code ccode; > + rtx compare_set = NULL_RTX, test_if, cond; > + rtx single_set = single_set (condgen); > + if (single_set != NULL_RTX) > +compare_set = single_set; > + else > + { > + int i; > + rtx pat = PATTERN (condgen); > + for (i = 0; i < XVECLEN (pat, 0); i++) > + if (GET_CODE (XVECEXP (pat, 0, i)) == SET > + && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE) > + compare_set = XVECEXP (pat, 0, i); > + } > + > + if (compare_set == NULL_RTX) > + return false; > + src = SET_SRC (compare_set); > + if (GET_CODE (src) != COMPARE) > + return false; > + > + /* Sandybridge doesn't support macro-fusion for cmp/test MEM-IMM > +insn pattern. */ > + if ((MEM_P (XEXP (src, 0)) > + && CONST_INT_P (XEXP (src, 1))) > + || (MEM_P (XEXP (src, 1)) > + && CONST_INT_P (XEXP
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Because deps_analyze_insn only analyzes data deps but no control deps. Control deps are included by add_branch_dependences. Without the chain_to_prev_insn in the end of add_branch_dependences, jmp will be control dependent on every previous insn in the same bb, and the cmp and jmp group could still be scheduled apart since they will not be put in ready list at the same time. On Tue, Sep 10, 2013 at 4:44 AM, Alexander Monakov wrote: > > > On Fri, 6 Sep 2013, Wei Mi wrote: > >> SCHED_GROUP works after I add chain_to_prev_insn after >> add_branch_dependences, in order to chain control dependences to prev >> insn for sched group. > > chain_to_prev_insn is done in the end of deps_analyze_insn, why is that not > sufficient? > > Alexander
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
I tried that and it caused some regressions, so I choosed to do chain_to_prev_insn another time in add_branch_dependences. There could be some dependence between those two functions. On Wed, Sep 11, 2013 at 2:58 AM, Alexander Monakov wrote: > > > On Tue, 10 Sep 2013, Wei Mi wrote: > >> Because deps_analyze_insn only analyzes data deps but no control deps. >> Control deps are included by add_branch_dependences. Without the >> chain_to_prev_insn in the end of add_branch_dependences, jmp will be >> control dependent on every previous insn in the same bb, and the cmp >> and jmp group could still be scheduled apart since they will not be >> put in ready list at the same time. > > Would calling add_branch_dependences before sched_analyze solve that, then? > > Alexander
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Taking the same issue slot is not enough for x86. The compare and branch need to be consecutive in binary to be macro-fused on x86. Thanks, Wei Mi. On Wed, Sep 11, 2013 at 10:45 AM, Andrew Pinski wrote: > On Wed, Sep 4, 2013 at 12:33 PM, Alexander Monakov wrote: >> On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher >> wrote: >>> >>> On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote: >>> > Hello, >>> > >>> > Could you use the existing facilities instead, such as adjust_priority >>> > hook, >>> > or making the compare-branch insn sequence a SCHED_GROUP? >>> >>> >>> Or a define_bypass? >> >> Hm, I don't think define_bypass would work: it still leaves the >> scheduler freedom to move the compare up. > > Even though it allows the scheduler freedom to move the compare up, > the schedule does due to the schedule model not being correct for the > processor. I have done the same for Octeon2 where it is able to > combine the compare and the branch and found the resulting schedule is > much better than even what this hack could do due to the instructions > still take a issue slot. Is it true that for these two processors it > takes an issue slot or is it being done before issue? > > Thanks, > Andrew Pinski > >> >> IMO adjust_priority would be preferable if it allows to achieve the goal. >> >> Alexander
[PATCH] disable use_vector_fp_converts for m_CORE_ALL
For the following testcase 1.c, on westmere and sandybridge, performance with the option -mtune=^use_vector_fp_converts is better (improves from 3.46s to 2.83s). It means cvtss2sd is often better than unpcklps+cvtps2pd on recent x86 platforms. 1.c: float total = 0.2; int k = 5; int main() { int i; for (i = 0; i < 10; i++) { total += (0.5 + k); } return total == 0.3; } assembly generated by gcc-r201963 without -mtune=^use_vector_fp_converts .L2: unpcklps%xmm0, %xmm0 subl$1, %eax cvtps2pd%xmm0, %xmm0 addsd %xmm1, %xmm0 unpcklpd%xmm0, %xmm0 cvtpd2ps%xmm0, %xmm0 jne .L2 assembly generated by gcc-r201963 with -mtune=^use_vector_fp_converts .L2: cvtss2sd%xmm0, %xmm0 subl$1, %eax addsd %xmm1, %xmm0 cvtsd2ss%xmm0, %xmm0 jne .L2 But for testcase 2.c (Thanks to Igor Zamyatin for the testcase), performance with the option -mtune=^use_vector_fp_converts is worse. Analysis to the assembly shows the performance degradation comes from partial reg stall caused by cvtsd2ss. Adding pxor %xmm0, %xmm0 before cvtsd2ss b(,%rdx,8), %xmm0 gets the performance back. 2.c: double b[1024]; float a[1024]; int main() { int i; for(i = 0 ; i < 1024 * 1024 * 256; i++) a[i & 1023] = a[i & 1023] * (float)b[i & 1023]; return (int)a[512]; } without -mtune-crtl=^use_vector_fp_converts .L2: movl%eax, %edx addl$1, %eax andl$1023, %edx cmpl$268435456, %eax movsd b(,%rdx,8), %xmm0 cvtpd2ps%xmm0, %xmm0==> without partial reg stall because of movsd. mulss a(,%rdx,4), %xmm0 movss %xmm0, a(,%rdx,4) jne .L2 with -mtune-crtl=^use_vector_fp_converts .L2: movl%eax, %edx addl$1, %eax andl$1023, %edx cmpl$268435456, %eax cvtsd2ssb(,%rdx,8), %xmm0 ==> with partial reg stall. Needs to insert "pxor %xmm0, %xmm0" before current insn. mulss a(,%rdx,4), %xmm0 movss %xmm0, a(,%rdx,4) jne .L2 So the patch is to turn off use_vector_fp_converts for m_CORE_ALL to use cvtss2sd/cvtsd2ss directly, and add "pxor %xmmreg %xmmreg" before cvtss2sd/cvtsd2ss to break partial reg stall (similar as what r201308 does for cvtsi2ss/cvtsi2sd). bootstrap and regression pass. ok for trunk? Thanks, Wei Mi. 2013-09-11 Wei Mi * config/i386/x86-tune.def (DEF_TUNE): Remove m_CORE_ALL. * config/i386/i386.md: Add define_peephole2 to break partial reg stall for cvtss2sd/cvtsd2ss. Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 201963) +++ config/i386/x86-tune.def(working copy) @@ -189,7 +189,7 @@ DEF_TUNE (X86_TUNE_NOT_VECTORMODE, "not_ /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion from FP to FP. */ DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS, "use_vector_fp_converts", - m_CORE_ALL | m_AMDFAM10 | m_GENERIC) + m_AMDFAM10 | m_GENERIC) /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion from integer to FP. */ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10) Index: config/i386/i386.md === --- config/i386/i386.md (revision 201963) +++ config/i386/i386.md (working copy) @@ -5075,6 +5075,63 @@ emit_move_insn (operands[0], CONST0_RTX (mode)); }) +;; Break partial reg stall for cvtsd2ss. + +(define_peephole2 + [(set (match_operand:SF 0 "register_operand") +(float_truncate:SF + (match_operand:DF 1 "nonimmediate_operand")))] + "TARGET_SSE2 && TARGET_SSE_MATH + && TARGET_SSE_PARTIAL_REG_DEPENDENCY + && optimize_function_for_speed_p (cfun) + && reload_completed && SSE_REG_P (operands[0]) + && peep2_reg_dead_p (0, operands[0]) + && (!SSE_REG_P (operands[1]) + || REGNO (operands[0]) != REGNO (operands[1]))" + [(set (match_dup 0) + (vec_merge:V4SF + (vec_duplicate:V4SF + (float_truncate:V2SF + (match_dup 1))) + (match_dup 0) + (const_int 1)))] +{ + operands[0] = simplify_gen_subreg (V4SFmode, operands[0], +SFmode, 0); + operands[1] = simplify_gen_subreg (V2DFmode, operands[1], +DFmode, 0); + emit_move_insn (operands[0], CONST0_RTX (V4SFmode)); +}) + +;; Break partial reg stall for cvtss2sd. + +(define_peephole2 + [(set (match_operand:DF 0 "register_operand") +(float_extend:DF + (match_operand:SF 1 "nonimmediate_operand")))] + "TARGET_SSE2 &
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Thanks! Your method to adjust 'last' is more concise. I try it and it works for small testcases. bootstrap and regression are ok. More performance test is going on. I agree with you that explicit handling in sched-deps.c for this feature looks not good. So I move it to sched_init (Instead of ix86_sched_init_global because ix86_sched_init_global is used to install scheduling hooks), and then it is possible for other architectures to use it. I also need the two hooks because one is used as the gate for macro-fusion controlled by -mtune-ctrl=fuse_cmp_and_branch on x86, and the other is used to check for which kind of cmp and branch pair macro-fusion is supported on target platform. But I am not sure if it is proper to put those two hooks under TARGET_SCHED hook vector. Thanks, Wei Mi. updated patch: Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 201771) +++ doc/tm.texi.in (working copy) @@ -6455,6 +6455,10 @@ scheduling one insn causes other insns t cycle. These other insns can then be taken into account properly. @end deftypefn +@hook TARGET_SCHED_MACRO_FUSION_P + +@hook TARGET_SCHED_MACRO_FUSION_PAIR_P + @hook TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK This hook is called after evaluation forward dependencies of insns in chain given by two parameter values (@var{head} and @var{tail} Index: doc/tm.texi === --- doc/tm.texi (revision 201771) +++ doc/tm.texi (working copy) @@ -6551,6 +6551,17 @@ scheduling one insn causes other insns t cycle. These other insns can then be taken into account properly. @end deftypefn +@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_P (void) +This hook is used to check whether target platform supports macro fusion. +@end deftypefn + +@deftypefn {Target Hook} bool TARGET_SCHED_MACRO_FUSION_PAIR_P (rtx @var{condgen}, rtx @var{condjmp}) +This hook is used to check whether two insns could be macro fused for +target microarchitecture. If this hook returns true for the given insn pair +(@var{condgen} and @var{condjmp}), scheduler will put them into a sched +group, and they will not be scheduled apart. +@end deftypefn + @deftypefn {Target Hook} void TARGET_SCHED_DEPENDENCIES_EVALUATION_HOOK (rtx @var{head}, rtx @var{tail}) This hook is called after evaluation forward dependencies of insns in chain given by two parameter values (@var{head} and @var{tail} Index: config/i386/i386.c === --- config/i386/i386.c (revision 201771) +++ config/i386/i386.c (working copy) @@ -2004,7 +2004,7 @@ static unsigned int initial_ix86_tune_fe /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction with a subsequent conditional jump instruction into a single compare-and-branch uop. */ - m_BDVER, + m_COREI7 | m_BDVER, /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag will impact LEA instruction selection. */ @@ -24845,6 +24845,99 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src; + if (!strcmp (ix86_tune_string, "corei7")) +{ + /* For Nehalem. */ + rtx single_set = single_set (condgen); + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ + if (single_set == NULL_RTX) +return false; + + src = SET_SRC (single_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 + return false; + + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP) + return false; + return true; +} + else if (!strcmp (ix86_tune_string, "corei7-avx")) +{ + /* For Sandybridge. */ + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx single_set = single_set (condgen); + if (single_set != NULL_RTX) +compare_set = single_set; + else + { + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET +
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
> Your new implementation is not efficient: when looping over BBs, you need to > look only at the last insn of each basic block. > Thanks, fixed. New patch attached. patch Description: Binary data
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
> Thanks. At this point you need feedback from x86 and scheduler maintainers. > I would recommend you to resubmit the patch with a Changelog text, and with > the text of the patch inline in the email (your last mail has the patch as a > binary attachment, which makes it harder to review and respond to). Please > mention if the updated patch passes bootstrap and regtest. Thanks! Here is the new patch. bootstrap and regression pass. ok for trunk? 2013-09-13 Wei Mi * sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of bb to remain their locations. * config/i386/x86-tune.def (DEF_TUNE): Add m_COREI7 for X86_TUNE_FUSE_CMP_AND_BRANCH. * config/i386/i386.c (ix86_macro_fusion_p): New Function. (ix86_macro_fusion_pair_p): Ditto. * doc/tm.texi.in: Generated. * doc/tm.texi: Ditto. * target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. * haifa-sched.c (try_group_insn): New function. (group_insns_for_macro_fusion): New function. (sched_init): Call group_insns_for_macro_fusion. Index: sched-rgn.c === --- sched-rgn.c (revision 201963) +++ sched-rgn.c (working copy) @@ -2443,6 +2443,8 @@ add_branch_dependences (rtx head, rtx ta cc0 setters remain at the end because they can't be moved away from their cc0 user. + Predecessors of SCHED_GROUP_P instructions at the end remain at the end. + COND_EXEC insns cannot be moved past a branch (see e.g. PR17808). Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually return @@ -2465,7 +2467,8 @@ add_branch_dependences (rtx head, rtx ta #endif || (!reload_completed && sets_likely_spilled (PATTERN (insn) - || NOTE_P (insn)) + || NOTE_P (insn) + || (last != 0 && SCHED_GROUP_P (last))) { if (!NOTE_P (insn)) { Index: config/i386/x86-tune.def === --- config/i386/x86-tune.def(revision 201963) +++ config/i386/x86-tune.def(working copy) @@ -196,7 +196,8 @@ DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, /* X86_TUNE_FUSE_CMP_AND_BRANCH: Fuse a compare or test instruction with a subsequent conditional jump instruction into a single compare-and-branch uop. */ -DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", m_BDVER) +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH, "fuse_cmp_and_branch", + m_COREI7 | m_BDVER) /* X86_TUNE_OPT_AGU: Optimize for Address Generation Unit. This flag will impact LEA instruction selection. */ DEF_TUNE (X86_TUNE_OPT_AGU, "opt_agu", m_ATOM | m_SLM) Index: config/i386/i386.c === --- config/i386/i386.c (revision 201963) +++ config/i386/i386.c (working copy) @@ -24850,6 +24850,99 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src; + if (!strcmp (ix86_tune_string, "corei7")) +{ + /* For Nehalem. */ + rtx single_set = single_set (condgen); + /* Nehalem doesn't support macro-fusion for add/sub+jmp. */ + if (single_set == NULL_RTX) +return false; + + src = SET_SRC (single_set); + if (GET_CODE (src) != COMPARE) + return false; + + /* Nehalem doesn't support macro-fusion for cmp/test MEM-IMM +insn pattern. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 + return false; + + /* Nehalem doesn't support macro-fusion for add/sub/dec/inc + jmp. */ + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP) + return false; + return true; +} + else if (!strcmp (ix86_tune_string, "corei7-avx")) +{ + /* For Sandybridge. */ + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx single_set = single_set (condgen); + if (single_set != NULL_RTX) +compare_set = single_set; + else + { + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET + && GET_CODE (SET_SRC (XVECEXP (pat, 0, i))) == COMPARE) + compare_set = XVECEXP (pat, 0, i); +
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
> Checking corei7/corei7-avx explicitly isn't a good idea. > It is also useful for Ivy Bridge and Haswell. I think you > should use a variable to control it, similar to > TARGET_FUSE_CMP_AND_BRANCH. > > > -- > H.J. Different x86 microarchitectures support macro-fusion for different compare and branch combinations. I need to differentiate various x86 microarchitectures. If use TARGET_FUSE_CMP_AND_BRANCH like vars to control it, it requires a bunch of them. That is why I choose to check corei7/corei7-avx in that function. I don't add core-avx-i/core-avx2 for now because I don't have those machines for testing. Thanks, Wei Mi.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote: > On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote: >>> Checking corei7/corei7-avx explicitly isn't a good idea. >>> It is also useful for Ivy Bridge and Haswell. I think you >>> should use a variable to control it, similar to >>> TARGET_FUSE_CMP_AND_BRANCH. >>> >>> >>> -- >>> H.J. >> >> Different x86 microarchitectures support macro-fusion for different >> compare and branch combinations. I need to differentiate various x86 >> microarchitectures. If use TARGET_FUSE_CMP_AND_BRANCH like vars to >> control it, it requires a bunch of them. That is why I choose to check > > Can you use TARGET_FUSE_CMP_AND_BRANCH covers cmp/test > and branch, TARGET_FUSE_ALU_AND_BRANCH covers and/add/sub/inc/dec > and branch? > Yes, I can. Thanks for the suggestion. Will fix it, and with Ivy Bridge and Haswell included. Thanks, Wei.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Fri, Sep 13, 2013 at 1:45 PM, Wei Mi wrote: > On Fri, Sep 13, 2013 at 12:09 PM, H.J. Lu wrote: >> On Fri, Sep 13, 2013 at 11:28 AM, Wei Mi wrote: >>>> Checking corei7/corei7-avx explicitly isn't a good idea. >>>> It is also useful for Ivy Bridge and Haswell. I think you >>>> should use a variable to control it, similar to >>>> TARGET_FUSE_CMP_AND_BRANCH. >>>> >>>> >>>> -- >>>> H.J. >>> >>> Different x86 microarchitectures support macro-fusion for different >>> compare and branch combinations. I need to differentiate various x86 >>> microarchitectures. If use TARGET_FUSE_CMP_AND_BRANCH like vars to >>> control it, it requires a bunch of them. That is why I choose to check >> >> Can you use TARGET_FUSE_CMP_AND_BRANCH covers cmp/test >> and branch, TARGET_FUSE_ALU_AND_BRANCH covers and/add/sub/inc/dec >> and branch? >> > > Yes, I can. Thanks for the suggestion. Will fix it, and with Ivy > Bridge and Haswell included. > Just notice another problem here: processor_type only contains PROCESSOR_COREI7, so I cannot differentiate Westmere and Sandybridge in x86-tune.def, which are different for TARGET_FUSE_ALU_AND_BRANCH. So do I have to separate m_SANDYBRIDGE out from m_COREI7?
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
>> Just notice another problem here: >> processor_type only contains PROCESSOR_COREI7, so I cannot >> differentiate Westmere and Sandybridge in x86-tune.def, which are >> different for TARGET_FUSE_ALU_AND_BRANCH. So do I have to separate >> m_SANDYBRIDGE out from m_COREI7? > > Yes, please. > > Thanks. > > -- > H.J. I separate the change into two patches here: Patch1 is to separate PROCESSOR_COREI7_AVX from PROCESSOR_COREI7. PROCESSOR_COREI7_AVX includes Sandybridge and Ivybridge. Patch2 is about the change for macro-fusion. Add three tune features here: X86_TUNE_FUSE_CMP_AND_BRANCH_64: CORE2 only support macrofusion for 32 bits. COREI7, COREI7_AVX and Haswell support 64 bits. X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: CORE2 only support macrofusion for branch only checking Zero and Carry flags. COREI7, COREI7_AVX and Haswell support branch checking Sign and Overflow flags. X86_TUNE_FUSE_ALU_AND_BRANCH: COREI7 doesn't support macrofusion for alu + branch. COREI7_AVX and Haswell support it. bootstrap and regression ok for the two patches. Thanks, Wei Mi. Patch1: 2013-09-16 Wei Mi * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Separate PROCESSOR_COREI7_AVX out from PROCESSOR_COREI7. * gcc/config/i386/i386.c (ix86_option_override_internal): Ditto. (ix86_issue_rate): Ditto. (ia32_multipass_dfa_lookahead): Ditto. (ix86_sched_init_global): Ditto. (get_builtin_code_for_version): Ditto. * gcc/config/i386/i386.h (enum target_cpu_default): Ditto. (enum processor_type): Ditto. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 14349be..7ea68cc 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -141,6 +141,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__corei7"); def_or_undef (parse_in, "__corei7__"); break; +case PROCESSOR_COREI7_AVX: + def_or_undef (parse_in, "__corei7_avx"); + def_or_undef (parse_in, "__corei7_avx__"); + break; case PROCESSOR_HASWELL: def_or_undef (parse_in, "__core_avx2"); def_or_undef (parse_in, "__core_avx2__"); @@ -239,6 +243,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, case PROCESSOR_COREI7: def_or_undef (parse_in, "__tune_corei7__"); break; +case PROCESSOR_COREI7_AVX: + def_or_undef (parse_in, "__tune_corei7_avx__"); + break; case PROCESSOR_HASWELL: def_or_undef (parse_in, "__tune_core_avx2__"); break; diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 536c357..1fd3f60 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -1908,8 +1908,9 @@ const struct processor_costs *ix86_cost = &pentium_cost; #define m_P4_NOCONA (m_PENT4 | m_NOCONA) #define m_CORE2 (1< * gcc/config/i386/i386.c (ix86_macro_fusion_p): New Function. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..85b7aa0 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24856,6 +24856,90 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH + && (!TARGET_64BIT || TARGET_FUSE_CMP_AND_BRANCH_64)) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src; + rtx single_set = single_set (condgen); + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + + if (single_set == NULL_RTX + && !TARGET_FUSE_ALU_AND_BRANCH) +return false; + + if (single_set != NULL_RTX) +compare_set = single_set; + else +{ + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET + && GET_CODE (SET_SRC (X
Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
Ping. > -Original Message- > From: Wei Mi [mailto:w...@google.com] > Sent: Thursday, September 12, 2013 2:51 AM > To: GCC Patches > Cc: David Li; Zamyatin, Igor > Subject: [PATCH] disable use_vector_fp_converts for m_CORE_ALL > > For the following testcase 1.c, on westmere and sandybridge, performance with > the option -mtune=^use_vector_fp_converts is better (improves from 3.46s to > 2.83s). It means cvtss2sd is often better than > unpcklps+cvtps2pd on recent x86 platforms. > > 1.c: > float total = 0.2; > int k = 5; > > int main() { > int i; > > for (i = 0; i < 10; i++) { >total += (0.5 + k); > } > > return total == 0.3; > } > > assembly generated by gcc-r201963 without -mtune=^use_vector_fp_converts > .L2: > unpcklps%xmm0, %xmm0 > subl$1, %eax > cvtps2pd%xmm0, %xmm0 > addsd %xmm1, %xmm0 > unpcklpd%xmm0, %xmm0 > cvtpd2ps%xmm0, %xmm0 > jne .L2 > > assembly generated by gcc-r201963 with -mtune=^use_vector_fp_converts > .L2: > cvtss2sd%xmm0, %xmm0 > subl$1, %eax > addsd %xmm1, %xmm0 > cvtsd2ss%xmm0, %xmm0 > jne .L2 > > But for testcase 2.c (Thanks to Igor Zamyatin for the testcase), performance > with the option -mtune=^use_vector_fp_converts is worse. > Analysis to the assembly shows the performance degradation comes from partial > reg stall caused by cvtsd2ss. Adding pxor %xmm0, %xmm0 before cvtsd2ss > b(,%rdx,8), %xmm0 gets the performance back. > > 2.c: > double b[1024]; > > float a[1024]; > > int main() > { > int i; > for(i = 0 ; i < 1024 * 1024 * 256; i++) > a[i & 1023] = a[i & 1023] * (float)b[i & 1023]; > return (int)a[512]; > } > > without -mtune-crtl=^use_vector_fp_converts > .L2: > movl%eax, %edx > addl$1, %eax > andl$1023, %edx > cmpl$268435456, %eax > movsd b(,%rdx,8), %xmm0 > cvtpd2ps%xmm0, %xmm0==> without partial reg stall > because of movsd. > mulss a(,%rdx,4), %xmm0 > movss %xmm0, a(,%rdx,4) > jne .L2 > > with -mtune-crtl=^use_vector_fp_converts > .L2: > movl%eax, %edx > addl$1, %eax > andl$1023, %edx > cmpl$268435456, %eax > cvtsd2ssb(,%rdx,8), %xmm0 ==> with partial reg > stall. Needs to insert "pxor %xmm0, %xmm0" before current insn. > mulss a(,%rdx,4), %xmm0 > movss %xmm0, a(,%rdx,4) > jne .L2 > > So the patch is to turn off use_vector_fp_converts for m_CORE_ALL to use > cvtss2sd/cvtsd2ss directly, and add "pxor %xmmreg %xmmreg" before > cvtss2sd/cvtsd2ss to break partial reg stall (similar as what r201308 does > for cvtsi2ss/cvtsi2sd). bootstrap and regression pass. ok for trunk? > > Thanks, > Wei Mi. > > 2013-09-11 Wei Mi > > * config/i386/x86-tune.def (DEF_TUNE): Remove > m_CORE_ALL. > * config/i386/i386.md: Add define_peephole2 to > break partial reg stall for cvtss2sd/cvtsd2ss. > > Index: config/i386/x86-tune.def > === > --- config/i386/x86-tune.def(revision 201963) > +++ config/i386/x86-tune.def(working copy) > @@ -189,7 +189,7 @@ DEF_TUNE (X86_TUNE_NOT_VECTORMODE, "not_ > /* X86_TUNE_USE_VECTOR_FP_CONVERTS: Prefer vector packed SSE conversion > from FP to FP. */ > DEF_TUNE (X86_TUNE_USE_VECTOR_FP_CONVERTS, "use_vector_fp_converts", > - m_CORE_ALL | m_AMDFAM10 | m_GENERIC) > + m_AMDFAM10 | m_GENERIC) > /* X86_TUNE_USE_VECTOR_CONVERTS: Prefer vector packed SSE conversion > from integer to FP. */ > DEF_TUNE (X86_TUNE_USE_VECTOR_CONVERTS, "use_vector_converts", m_AMDFAM10) > Index: config/i386/i386.md > === > --- config/i386/i386.md (revision 201963) > +++ config/i386/i386.md (working copy) > @@ -5075,6 +5075,63 @@ >emit_move_insn (operands[0], CONST0_RTX (mode)); > }) > > +;; Break partial reg stall for cvtsd2ss. > + > +(define_peephole2 > + [(set (match_operand:SF 0 "register_operand") > +(float_truncate:SF > + (match_operand:DF 1 "nonimmediate_operand")))] > + "TARGET_SSE2 && TARGET_SSE_MATH > + && TARGET_SSE_PARTIAL_REG_DEPENDENCY > + && optimize_function_for_speed_p (cfun) > + && reload_completed && SSE_REG_
Re: Revisit Core tunning flags
>> > http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html > > This patch seems resonable. (in fact I have pretty much same in my tree) > use_vector_fp_converts is actually trying to solve the same problem in AMD > hardware - you need to type the whole register when converting. > So it may work well for AMD chips too or may be the difference is that > Intel chips somehow handle "cvtpd2ps%xmm0, %xmm0" well even though > the upper half of xmm0 is ill defined, while AMD chips doesn't. > > The patch seems OK. I do not see rason for > && peep2_reg_dead_p (0, operands[0]) > test. Reg has to be dead since it is full destination of the operation. Ok, I see. I will delete it. > > Lets wait few days before commit so we know effect of > individual changes. I will test it on AMD hardware and we can decide on > generic tuning then. > > Honza Ok, thanks. Wei Mi.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
> You disable fusion for Budozer here sinze you did not add it into > TARGET_FUSE_CMP_AND_BRANCH_64. Ok, will add it. > > Perhaps we can have TARGET_FUSE_CMP_AND_BRANCH_64 and > TARGET_FUSE_CMP_AND_BRANCH_32 > plus an macro TARGET_FUSE_CMP_AND_BRANCH that chose corresponding variant > based > on TARGET_64BIT rather than having to wind down the test in every use. Ok, will fix it. >> +/* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a >> + subsequent conditional jump instruction when the condition jump >> + check sign flag (SF) or overflow flag (OF). */ >> +DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, >> "fuse_cmp_and_branch_soflags", >> + m_COREI7 | m_COREI7_AVX | m_HASWELL) > > This flag is affecting only fuding of ALU and BRANCh or should it also affect > X86_TUNE_FUSE_CMP_AND_BRANCH? In current implementation it seems to be the > first > and in that case it ought to be documented that way and probably > called ALT_AND_BRANCH_SOFLAGS to avoid confussion. > X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS is not affecting fusing ALU and BRANCH. It is added because m_CORE2 doesn't support fusing cmp and JL/JG/JLE/JGE. > This is what Agner Fog says: > > A CMP or TEST instruction immediately followed by a conditional jump can be > fused into a single macro-op. This applies to all versions of the CMP and TEST > instructions and all conditional jumps except if the CMP or TEST has a > rip-relative address or both a displacement and an immediate operand. > > So it is a bit more weird. Perhaps you can extend your predicate to look > for IP relative addresses & displacements of CMP and TEST, too. > > Honza Thanks for checking it. Agner's guide also mentions this constraint for sandybridge, ivybridge I missed it because Intel optimization reference manual doesn't mention it. I did some experiment just now and verified the constraint for sandybridge existed. Will add the predicate. Thanks, Wei Mi.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
This is the updated patch2. Changed: 1. For cmp/test with rip-relative addressing mem operand, don't group insns. Bulldozer also doesn't support fusion for cmp/test with both displacement MEM and immediate operand, while m_CORE_ALL doesn't support fusion for cmp/test with MEM and immediate operand. I simplify choose to use the more stringent constraint here (m_CORE_ALL's constraint). 2. Add Budozer back and merge TARGET_FUSE_CMP_AND_BRANCH_64 and TARGET_FUSE_CMP_AND_BRANCH_32. bootstrap and regression pass. ok for trunk? 2013-09-24 Wei Mi * gcc/config/i386/i386.c (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..4a04778 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24856,6 +24856,167 @@ ia32_multipass_dfa_lookahead (void) } } +/* Extracted from ix86_print_operand_address. Check whether ADDR is a + rip-relative address. */ + +static bool +rip_relative_addr_p (rtx addr) +{ + struct ix86_address parts; + rtx base, index, disp; + int ok; + + if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_VSIBADDR) +{ + ok = ix86_decompose_address (XVECEXP (addr, 0, 0), &parts); + parts.index = XVECEXP (addr, 0, 1); +} + else if (GET_CODE (addr) == UNSPEC && XINT (addr, 1) == UNSPEC_LEA_ADDR) +ok = ix86_decompose_address (XVECEXP (addr, 0, 0), &parts); + else +ok = ix86_decompose_address (addr, &parts); + + gcc_assert (ok); + base = parts.base; + index = parts.index; + disp = parts.disp; + + if (TARGET_64BIT && !base && !index) +{ + rtx symbol = disp; + + if (GET_CODE (disp) == CONST + && GET_CODE (XEXP (disp, 0)) == PLUS + && CONST_INT_P (XEXP (XEXP (disp, 0), 1))) + symbol = XEXP (XEXP (disp, 0), 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + && SYMBOL_REF_TLS_MODEL (symbol) == 0)) + return true; +} + if (flag_pic && !base && !index) +{ + if (GET_CODE (disp) == CONST + && GET_CODE (XEXP (disp, 0)) == UNSPEC + && (XINT (XEXP (disp, 0), 1) == UNSPEC_PCREL + || XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL + || (TARGET_64BIT + && XINT (XEXP (disp, 0), 1) == UNSPEC_GOTNTPOFF))) + return true; +} + return false; +} + +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src, dest; + rtx single_set = single_set (condgen); + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx alu_set = NULL_RTX, addr = NULL_RTX; + + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP + && get_attr_type (condgen) != TYPE_INCDEC + && get_attr_type (condgen) != TYPE_ALU) +return false; + + if (single_set == NULL_RTX + && !TARGET_FUSE_ALU_AND_BRANCH) +return false; + + if (single_set != NULL_RTX) +compare_set = single_set; + else +{ + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET) + { + rtx set_src = SET_SRC (XVECEXP (pat, 0, i)); + if (GET_CODE (set_src) == COMPARE) + compare_set = XVECEXP (pat, 0, i); + else + alu_set = XVECEXP (pat, 0, i); + } +} + if (compare_set == NULL_RTX) +return false; + src = SET_SRC (compare_set); + if (GET_CODE (src) != COMPARE) +return false; + + /* Macro-fusion for cmp/test MEM-IMM + conditional jmp is not + supported. */ + if ((MEM_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1))) + || (MEM_P (XEXP (src, 1)) + && CONST_INT_P (XEXP (src, 0 +
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
>> It doesn't look right. IP relative address is only possible >> with TARGET_64BIT and >> >> 1. base == pc. Or >> 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and >> NSPEC_GOTNTPOFF. > > Target 64bit should be tested above. We however output RIP addresses > also for basic symbol references. I.e. when base is an symbol addresss. > such as in: > int a; > int t() > { > return a; > } > > memory_address_length already contains logic to figure out if there is IP > relative addressing going on (I am not sure it is completely accurate either). > Better to break it out to a common predicate and perhaps unify with what > ix86_print_operand_address is doing. > > Honza >> >> >> -- >> H.J. Thanks. How about this one. bootstrap and regression are going on. 2013-09-24 Wei Mi * gcc/config/i386/i386.c (memory_address_length): Extract a part of code to rip_relative_addr_p. (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..808e0c6 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24275,25 +24275,8 @@ memory_address_length (rtx addr, bool lea) else if (disp && !base && !index) { len += 4; - if (TARGET_64BIT) - { - rtx symbol = disp; - - if (GET_CODE (disp) == CONST) - symbol = XEXP (disp, 0); - if (GET_CODE (symbol) == PLUS - && CONST_INT_P (XEXP (symbol, 1))) - symbol = XEXP (symbol, 0); - - if (GET_CODE (symbol) != LABEL_REF - && (GET_CODE (symbol) != SYMBOL_REF - || SYMBOL_REF_TLS_MODEL (symbol) != 0) - && (GET_CODE (symbol) != UNSPEC - || (XINT (symbol, 1) != UNSPEC_GOTPCREL - && XINT (symbol, 1) != UNSPEC_PCREL - && XINT (symbol, 1) != UNSPEC_GOTNTPOFF))) - len++; - } + if (rip_relative_addr_p (&parts)) + len++; } else { @@ -24856,6 +24839,159 @@ ia32_multipass_dfa_lookahead (void) } } +/* Check whether x86 address PARTS is a pc-relative address. */ + +static bool +rip_relative_addr_p (struct ix86_address *parts) +{ + struct ix86_address *parts; + rtx base, index, disp; + + base = parts->base; + index = parts->index; + disp = parts->disp; + + if (disp && !base && !index) +{ + if (TARGET_64BIT) + { + rtx symbol = disp; + + if (GET_CODE (disp) == CONST) + symbol = XEXP (disp, 0); + if (GET_CODE (symbol) == PLUS + && CONST_INT_P (XEXP (symbol, 1))) + symbol = XEXP (symbol, 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + && SYMBOL_REF_TLS_MODEL (symbol) == 0) + || (GET_CODE (symbol) == UNSPEC + && (XINT (symbol, 1) == UNSPEC_GOTPCREL + || XINT (symbol, 1) == UNSPEC_PCREL + || XINT (symbol, 1) == UNSPEC_GOTNTPOFF))) + return true; + } +} + return false; +} + +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src, dest; + rtx single_set = single_set (condgen); + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx alu_set = NULL_RTX, addr = NULL_RTX; + + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP + && get_attr_type (condgen) != TYPE_INCDEC + && get_attr_type (condgen) != TYPE_ALU) +return false; + + if (single_set == NULL_RTX + && !TARGET_FUSE_ALU_AND_BRANCH) +return false; + + if (single_set != NULL_RTX) +compare_set = single_set; +
[PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
Hi, This patch is to address the problem described here: http://gcc.gnu.org/ml/gcc/2013-09/msg00187.html The patch changes ALLOCNO_MODE of a pseudo reg to be outermode if the pseudo reg is used in a paradoxical subreg, so IRA will not mistakenly assign an operand with a bigger mode to a smaller hardreg which couldn't find a pair register. No test is added because I cannot create a small testcase to reproduce the problem on trunk, the difficulty of which was described in the above post. bootstrap and regression pass. ok for trunk? Thanks, Wei Mi. 2013-09-24 Wei Mi * ira-build.c (create_insn_allocnos): Fix ALLOCNO_MODE in the case of paradoxical subreg. Index: ira-build.c === --- ira-build.c (revision 201963) +++ ira-build.c (working copy) @@ -1688,6 +1688,30 @@ create_insn_allocnos (rtx x, bool output } return; } + else if (code == SUBREG) +{ + int regno; + rtx subreg_reg = SUBREG_REG (x); + enum machine_mode outermode, innermode; + + create_insn_allocnos (subreg_reg, output_p); + /* For paradoxical subreg, set allocno's mode to be +the outermode. */ + outermode = GET_MODE (x); + innermode = GET_MODE (subreg_reg); + if (REG_P (subreg_reg) + && (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))) + { + regno = REGNO (subreg_reg); + if (regno >= FIRST_PSEUDO_REGISTER) + { + ira_allocno_t a = ira_curr_regno_allocno_map[regno]; + ira_assert (a != NULL); + ALLOCNO_MODE (a) = outermode; + } + } + return; +} else if (code == SET) { create_insn_allocnos (SET_DEST (x), true);
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> performance. For example, we have code > > ... (reg:DI) ... > ... > ... (subreg:TI (reg:DI)) > ... > ...(reg:DI) > > We need two hard regs only for the second place by transforming > > p = (reg:DI) > > ...(subreg:TI p) > > With this patch we requires two hard regs for the all live range of the > original pseudo (which can be quite long). It might considerably worsen > code performance. > > So the problem could be fixed in LRA which can make this transformation > or even in IRA (that would be even better as we put pseudo P into the > global picture vs. probably spilling some pseudos for P in LRA). > Thanks for your detailed explanation. Now I understand what you concern here. > I need some time to think what is better (e.g. I don't know how to > implement it in IRA without big compilation slow down). In any case, > the solution for the problem will be not that easy as in the patch. To fix it in IRA, it looks like we want a live range splitting pass for pseudos used in paradoxical subreg here. Is the potential compilation slow down you mention here caused by more allocnos introduced by the live range splitting, or something else? Thanks, Wei Mi.
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> To define for what occurrence of the pseudo we should do the > transformation, we need to create allocnos and calculate reg classes to > know what paradoxical subreg needs more hard regs (the transformations > can not be done for all paradoxical subregs as my experience shows many > RTL changes result in worse RA even if we have heuristics to remove the > generated changes as in this case would be trying to assign the same > hard reg for the original and the new pseudo). > After doing the transformations, we need to recalculate reg classes > and rebuild allocnos (both are expensive). To speed up the process it > could be implemented as some kind of update of already existing data but > it will complicate code much. > I see, thanks! > So right now I think implementing this in LRA would be easier Still LRA > has a pretty good register (re-)allocation (although it is worse than in > IRA). > When you get an idea how to fix it in LRA, if you are still busy, I would be happy to do the implementation if you could brief your idea. Thanks, Wei Mi.
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> Probably the best place to add a code for this is in > lra-constraints.c::simplify_operand_subreg by permitting subreg reload > for paradoxical subregs whose hard regs are not fully in allocno class > of the inner pseudo. > > It needs a good testing (i'd check that the generated code is not > changed on variety benchmarks to see that the change has no impact on > the most programs performance) and you need to add a good comment > describing why this change is needed. > Vlad, thanks! I make another patch here by following your guidance. Please check whether it is ok. Boostrap and regression ok. I am also verifying its performance effect on google applications (But most of them are 64 bits, so I cannot verify its performance effect on 32 bits apps). The idea of the patch is here: For the following two types of paradoxical subreg, we insert reload in simplify_operand_subreg: 1. If the op_type is OP_IN, and the hardreg could not be paired with other hardreg to contain the outermode operand, for example R15 in x86-64 (checked by in_hard_reg_set_p), we need to insert a reload. If the hardreg allocated in IRA is R12, we don't need to insert reload here because upper half of rvalue paradoxical subreg is undefined so it is ok for R13 to contain undefined data. 2. If the op_type is OP_OUT or OP_INOUT. (It is possible that we don't need to insert reload for this case too, because the upper half of lvalue paradoxical subreg is useless. If the assignment to upper half of subreg register will not be generated by rtl split4 stage, we don't need to insert reload here. But I havn't got a testcase to verify it so I keep it) Here is a paradoxical subreg example showing how the reload is generated: (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) (subreg:TI (reg:DI 107 [ __comp ]) 0)) {*movti_internal_rex64} In IRA, reg107 is allocated to a DImode hardreg. If reg107 is assigned to hardreg R15, compiler cannot find another hardreg to pair with R15 to contain TImode data. So we insert a TImode reload pseudo reg180 for it. After reload is inserted: (insn 283 0 0 (set (subreg:DI (reg:TI 180 [orig:107 __comp ] [107]) 0) (reg:DI 107 [ __comp ])) -1 (insn 5 4 7 2 (set (reg:TI 106 [ __comp ]) (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) {*movti_internal_rex64} Two reload hard registers will be allocated to reg180 to save TImode operand in LRA_assign. Thanks, Wei Mi. 2013-09-30 Wei Mi * lra-constraints.c (insert_move_for_subreg): New function. (simplify_operand_subreg): Add reload for paradoxical subreg. Index: lra-constraints.c === --- lra-constraints.c (revision 201963) +++ lra-constraints.c (working copy) @@ -1158,6 +1158,30 @@ process_addr_reg (rtx *loc, rtx *before, return true; } +/* Insert move insn in simplify_operand_subreg. BEFORE returns + the insn to be inserted before curr insn. AFTER returns the + the insn to be inserted after curr insn. ORIGREG and NEWREG + are the original reg and new reg for reload. */ +static void +insert_move_for_subreg (rtx *before, rtx *after, rtx origreg, rtx newreg) +{ + if (before) +{ + push_to_sequence (*before); + lra_emit_move (newreg, origreg); + *before = get_insns (); + end_sequence (); +} + if (after) +{ + start_sequence (); + lra_emit_move (origreg, newreg); + emit_insn (*after); + *after = get_insns (); + end_sequence (); +} +} + /* Make reloads for subreg in operand NOP with internal subreg mode REG_MODE, add new reloads for further processing. Return true if any reload was generated. */ @@ -1169,6 +1193,8 @@ simplify_operand_subreg (int nop, enum m enum machine_mode mode; rtx reg, new_reg; rtx operand = *curr_id->operand_loc[nop]; + enum reg_class regclass; + enum op_type type; before = after = NULL_RTX; @@ -1177,6 +1203,7 @@ simplify_operand_subreg (int nop, enum m mode = GET_MODE (operand); reg = SUBREG_REG (operand); + type = curr_static_id->operand[nop].type; /* If we change address for paradoxical subreg of memory, the address might violate the necessary alignment or the access might be slow. So take this into consideration. We should not worry @@ -1215,13 +1242,9 @@ simplify_operand_subreg (int nop, enum m && (hard_regno_nregs[hard_regno][GET_MODE (reg)] >= hard_regno_nregs[hard_regno][mode]) && simplify_subreg_regno (hard_regno, GET_MODE (reg), -SUBREG_BYTE (operand), mode) < 0 - /* Don't reload subreg for matching reload. It is actually - valid subreg in LRA. */ - && ! LRA_SUBREG_P (operand)) +SUBREG_BYTE (operand), mode) < 0) || CONSTANT_P (reg) || GET_CODE (reg) == PLUS || MEM_P (reg
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> Please check whether it is ok. Boostrap and regression ok. I am also > verifying its performance effect on google applications (But most of > them are 64 bits, so I cannot verify its performance effect on 32 bits > apps). Have verified It has no performance impact on google applications. Thanks, Wei Mi.
Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
> Hi Wei Mi, > > Have you checked in your patch? > > -- > H.J. No, I havn't. Honza wants me to wait for his testing on AMD hardware. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html
Re: [PATCH] disable use_vector_fp_converts for m_CORE_ALL
On Tue, Oct 1, 2013 at 3:50 PM, Jan Hubicka wrote: >> > Hi Wei Mi, >> > >> > Have you checked in your patch? >> > >> > -- >> > H.J. >> >> No, I havn't. Honza wants me to wait for his testing on AMD hardware. >> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01603.html > I only wanted to separate it from the changes in generic so the regular > testers > can pick it up separately. So just go ahead and check it in. > > Honza Thanks, check in as r203095. Wei Mi.
Re: [asan] Emit GIMPLE directly, small cleanups
Hi Diego, >> /* Build >> - (base_addr >> ASAN_SHADOW_SHIFT) | targetm.asan_shadow_offset (). >> */ >> + (base_addr >> ASAN_SHADOW_SHIFT) + targetm.asan_shadow_offset (). >> */ > > > Hm, I wonder if this is a documentation problem or we're generating bad > runtime code. Wei, you tested the runtime and it was working with the GCC > generated code, right? > > In any case, we can adjust the expression later. I only tested my smallcase and it worked. Because usually the redzone are not a very small areas (more than 4K), so I think it is possible the smallcase works even if the shadow addr calculation is incorrect and has small deviation. Thanks, Wei.
Re: [asan] Emit GIMPLE directly, small cleanups
Hi, Here is the initial test results of gcc asan patch, and it shows us some missing features in gcc but existing in llvm. [1]. gcc regression test for gcc-asan passes. [2]. llvm regression tests for gcc-asan: 18 failures in 123 for tests written in google test and 24 failures in 28 for tests written in lit tests. gcc missing features: 1. gcc implementation doesn't support stack/global overflow check 1. gcc implementation doesn't support some attributes, such as __attribute__((no_address_safety_analysis)), which llvm does 2. gcc doesn't detect out-of-bound bitfield access of heap, which llvm does 3. gcc doesn't detect out-of-bound memset, memcopy, strcat and strcpy for heap allocated memory or string, which llvm does 4. gcc doesn't contain similar options: -mllvm -asan-blacklist, -mllvm -asan-initialization-order 5. gcc -fasan doesn't take effect at -O0, but llvm does. Most lit tests contain checks from -O0 to -O3, which makes gcc fail. 6. $HOME/llvm/trunk/projects/compiler-rt/lib/asan/scripts/asan_symbolize.py could generate valid source locations from virtual addresses for llvm binary, but fail to do that for gcc binary. example, llvm result #1 0x402694 in main heap-overflow.cc:23 .vs. gcc result: #1 0x402694 in main ??:0. Some FileCheck in llvm lit tests expect the valid source locations. Thanks, Wei. On Thu, Oct 11, 2012 at 10:31 AM, Jakub Jelinek wrote: > On Thu, Oct 11, 2012 at 01:14:31PM -0400, Diego Novillo wrote: >> On 2012-10-11 12:38 , Jakub Jelinek wrote: >> >> >- gimple_seq seq, stmts; >> >- tree shadow_type = size_in_bytes == 16 ? >> >- short_integer_type_node : char_type_node; >> >- tree shadow_ptr_type = build_pointer_type (shadow_type); >> >- tree uintptr_type = lang_hooks.types.type_for_mode (ptr_mode, >> >- /*unsignedp=*/true); >> >+ tree shadow_ptr_type = shadow_ptr_types[size_in_bytes == 16]; >> >> Add '? 1 : 0' in the array index expression. > > Ok. > >> >/* Build >> >- (base_addr >> ASAN_SHADOW_SHIFT) | targetm.asan_shadow_offset (). */ >> >+ (base_addr >> ASAN_SHADOW_SHIFT) + targetm.asan_shadow_offset (). */ >> >> Hm, I wonder if this is a documentation problem or we're generating >> bad runtime code. Wei, you tested the runtime and it was working >> with the GCC generated code, right? > > The asan web pages document |, the old tree-asan.c emitted +, I've changed > it to BIT_IOR_EXPR, but that resulted in worse assembly, and I believe at > least for the current x86_64 and i686 address ranges and shadow offset > values it actually doesn't matter. > On x86_64 stack is like 0x76e0, shifted down by 3 is still smaller > than 1L << 44 that is ored or added to it. And the negative half of the > address space is used by the kernel, nothing is mapped into it (besides > vsyscall page) and neither | nor + of 1L << 44 to it would work well. > On i386, | and + works the same for all addresses, as 0xU >> 3 > is still smaller than 1 << 29. > The reason why + generates better code on x86_64/i686 is that one can use > e.g. movzbl (%r1, %r2), %r3 instead of orq %r2, %r1; movzb (%r1), %r3. > >> >+ if (shadow_ptr_types[0] == NULL_TREE) >> >+{ >> >+ alias_set_type set = new_alias_set (); >> >+ shadow_ptr_types[0] >> >+= build_distinct_type_copy (unsigned_char_type_node); >> >+ TYPE_ALIAS_SET (shadow_ptr_types[0]) = set; >> >+ shadow_ptr_types[0] = build_pointer_type (shadow_ptr_types[0]); >> >+ shadow_ptr_types[1] >> >+= build_distinct_type_copy (short_unsigned_type_node); >> >+ TYPE_ALIAS_SET (shadow_ptr_types[1]) = set; >> >+ shadow_ptr_types[1] = build_pointer_type (shadow_ptr_types[1]); >> >+} >> >> Move this to an initialization function, please. > > Okay. > > Jakub
Re: [asan] migrate runtime from llvm
> > This is a good start. Can you send a patch out without including > libasan so at least the toplevel parts can be reviewed easier? > Also the changelog entry for gcc.c go under the gcc/ChangeLog rather > than the toplevel one. > > Thanks, > Andrew Pinski Sure, I attach it. Thanks for pointing out the changelog error. Thanks, Wei. Index: Makefile.in === --- Makefile.in (revision 192487) +++ Makefile.in (working copy) @@ -575,7 +575,7 @@ all: # This is the list of directories that may be needed in RPATH_ENVVAR # so that programs built for the target machine work. -TARGET_LIB_PATH = $(TARGET_LIB_PATH_libstdc++-v3)$(TARGET_LIB_PATH_libmudflap)$(TARGET_LIB_PATH_libssp)$(TARGET_LIB_PATH_libgomp)$(TARGET_LIB_PATH_libitm)$(TARGET_LIB_PATH_libatomic)$(HOST_LIB_PATH_gcc) +TARGET_LIB_PATH = $(TARGET_LIB_PATH_libstdc++-v3)$(TARGET_LIB_PATH_libmudflap)$(TARGET_LIB_PATH_libasan)$(TARGET_LIB_PATH_libssp)$(TARGET_LIB_PATH_libgomp)$(TARGET_LIB_PATH_libitm)$(TARGET_LIB_PATH_libatomic)$(HOST_LIB_PATH_gcc) @if target-libstdc++-v3 TARGET_LIB_PATH_libstdc++-v3 = $$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs: @@ -585,6 +585,10 @@ TARGET_LIB_PATH_libstdc++-v3 = $$r/$(TAR TARGET_LIB_PATH_libmudflap = $$r/$(TARGET_SUBDIR)/libmudflap/.libs: @endif target-libmudflap +@if target-libasan +TARGET_LIB_PATH_libasan = $$r/$(TARGET_SUBDIR)/libasan/.libs: +@endif target-libasan + @if target-libssp TARGET_LIB_PATH_libssp = $$r/$(TARGET_SUBDIR)/libssp/.libs: @endif target-libssp @@ -914,6 +918,7 @@ configure-host: \ configure-target: \ maybe-configure-target-libstdc++-v3 \ maybe-configure-target-libmudflap \ +maybe-configure-target-libasan \ maybe-configure-target-libssp \ maybe-configure-target-newlib \ maybe-configure-target-libgcc \ @@ -1062,6 +1067,7 @@ all-host: maybe-all-lto-plugin all-target: maybe-all-target-libstdc++-v3 @endif target-libstdc++-v3-no-bootstrap all-target: maybe-all-target-libmudflap +all-target: maybe-all-target-libasan all-target: maybe-all-target-libssp all-target: maybe-all-target-newlib @if target-libgcc-no-bootstrap @@ -1152,6 +1158,7 @@ info-host: maybe-info-lto-plugin info-target: maybe-info-target-libstdc++-v3 info-target: maybe-info-target-libmudflap +info-target: maybe-info-target-libasan info-target: maybe-info-target-libssp info-target: maybe-info-target-newlib info-target: maybe-info-target-libgcc @@ -1233,6 +1240,7 @@ dvi-host: maybe-dvi-lto-plugin dvi-target: maybe-dvi-target-libstdc++-v3 dvi-target: maybe-dvi-target-libmudflap +dvi-target: maybe-dvi-target-libasan dvi-target: maybe-dvi-target-libssp dvi-target: maybe-dvi-target-newlib dvi-target: maybe-dvi-target-libgcc @@ -1314,6 +1322,7 @@ pdf-host: maybe-pdf-lto-plugin pdf-target: maybe-pdf-target-libstdc++-v3 pdf-target: maybe-pdf-target-libmudflap +pdf-target: maybe-pdf-target-libasan pdf-target: maybe-pdf-target-libssp pdf-target: maybe-pdf-target-newlib pdf-target: maybe-pdf-target-libgcc @@ -1395,6 +1404,7 @@ html-host: maybe-html-lto-plugin html-target: maybe-html-target-libstdc++-v3 html-target: maybe-html-target-libmudflap +html-target: maybe-html-target-libasan html-target: maybe-html-target-libssp html-target: maybe-html-target-newlib html-target: maybe-html-target-libgcc @@ -1476,6 +1486,7 @@ TAGS-host: maybe-TAGS-lto-plugin TAGS-target: maybe-TAGS-target-libstdc++-v3 TAGS-target: maybe-TAGS-target-libmudflap +TAGS-target: maybe-TAGS-target-libasan TAGS-target: maybe-TAGS-target-libssp TAGS-target: maybe-TAGS-target-newlib TAGS-target: maybe-TAGS-target-libgcc @@ -1557,6 +1568,7 @@ install-info-host: maybe-install-info-lt install-info-target: maybe-install-info-target-libstdc++-v3 install-info-target: maybe-install-info-target-libmudflap +install-info-target: maybe-install-info-target-libasan install-info-target: maybe-install-info-target-libssp install-info-target: maybe-install-info-target-newlib install-info-target: maybe-install-info-target-libgcc @@ -1638,6 +1650,7 @@ install-pdf-host: maybe-install-pdf-lto- install-pdf-target: maybe-install-pdf-target-libstdc++-v3 install-pdf-target: maybe-install-pdf-target-libmudflap +install-pdf-target: maybe-install-pdf-target-libasan install-pdf-target: maybe-install-pdf-target-libssp install-pdf-target: maybe-install-pdf-target-newlib install-pdf-target: maybe-install-pdf-target-libgcc @@ -1719,6 +1732,7 @@ install-html-host: maybe-install-html-lt install-html-target: maybe-install-html-target-libstdc++-v3 install-html-target: maybe-install-html-target-libmudflap +install-html-target: maybe-install-html-target-libasan install-html-target: maybe-install-html-target-libssp install-html-target: maybe-install-html-target-newlib install-html-target: maybe-install-html-target-libgcc @@ -1800,6 +1814,7 @@ installcheck-host: maybe-installcheck-lt installcheck-target: maybe-installcheck-target-libstdc++-v3 installcheck-targ
Re: [asan] migrate runtime from llvm
On Thu, Oct 18, 2012 at 11:16 AM, Jakub Jelinek wrote: > On Thu, Oct 18, 2012 at 09:46:36AM -0700, Wei Mi wrote: >> --- gcc/gcc.c (revision 192567) >> +++ gcc/gcc.c (working copy) >> @@ -679,6 +679,7 @@ proper position among the other output f >> %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\ >> %(mflib) " STACK_SPLIT_SPEC "\ >> %{fprofile-arcs|fprofile-generate*|coverage:-lgcov}\ >> +%{fasan:-lasan -lpthread -ldl}\ >> %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\ >> %{!nostdlib:%{!nostartfiles:%E}} %{T*} }}" >> #endif > > Sorry for not mentioning it earlier at once, but -lpthread -ldl shouldn't > be there either, the -fasan compiled code makes no direct calls to > -lpthread nor -ldl, just libasan. > > Jakub Thanks, I move those options to libasan LDFLAGS. Wei.
Re: [asan] migrate runtime from llvm
David, I put the m4 subdir under libasan because once I use the .m4 files (libtool.m4 lt~obsolete.m4 ltoptions.m4 ltsugar.m4 ltversion.m4) and ltmain.sh under $topsrcdir, the problem that a bad libtool was generated under $topbuilddir/x86_64-unknown-linux-gnu/libasan you met yesterday appeared. That is why I had to generate the new libtool m4 files and ltmain.sh using libtoolize. Thanks, Wei. On Fri, Oct 19, 2012 at 10:16 AM, Xinliang David Li wrote: > I tried it, and this version works for me. > > Your probably do not need to add the m4 subdir under libasan. The > required m4 files are either in .. or ../config dir. See how > libmudflap does it. > > Other than that, if there are no other comments, the change is good to > check into the branch. Remaining bugs can always be found and fixed > later. > > thanks, > > David > > > > On Thu, Oct 18, 2012 at 8:04 PM, Wei Mi wrote: >> Hi, >> >> David cought a problem in the last patch when he tries to built >> libasan. The problem was that an incomplete libtool under libasan >> build directory was generated. The cause is that the old patch used an >> old ltmain.sh to generate libtool. I fix it and attach a new patch. >> And the new patch move -lpthread and -ldl to libasan LDFLAGS. >> >> Thanks, >> Wei.
[asan] a small patch to fix bogus error about global buffer overflow
Hi, A small patch to remove the bogus error reports exposed in the spec2000 testing. In varasm.c, asan_protected should be equivalent with asan_protect_global (decl) all the time, or else compiler will not insert redzones for some globals planned to be protected. gcc/ChangeLog: 2012-10-25 Wei Mi A small fix to remove bogus error report of global buffer overflow. * varasm.c: correct the condition of asan_protected being true. Index: varasm.c === --- varasm.c(revision 192822) +++ varasm.c(working copy) @@ -1991,11 +1991,10 @@ assemble_variable (tree decl, int top_le align_variable (decl, dont_output_data); if (flag_asan - && asan_protect_global (decl) - && DECL_ALIGN (decl) < ASAN_RED_ZONE_SIZE * BITS_PER_UNIT) + && asan_protect_global (decl)) { asan_protected = true; - DECL_ALIGN (decl) = ASAN_RED_ZONE_SIZE * BITS_PER_UNIT; + DECL_ALIGN (decl) = MAX (DECL_ALIGN (decl), ASAN_RED_ZONE_SIZE * BITS_PER_UNIT); } set_mem_align (decl_rtl, DECL_ALIGN (decl)); Thanks, Wei.
Re: [asan] a small patch to fix bogus error about global buffer overflow
Hi, Thanks for all the comments. Fixed. Ok to checkin? 2012-10-25 Wei Mi * varasm.c (assemble_variable): Set asan_protected even for decls that are already ASAN_RED_ZONE_SIZE or more bytes aligned. Index: varasm.c === --- varasm.c(revision 192822) +++ varasm.c(working copy) @@ -1991,11 +1991,11 @@ assemble_variable (tree decl, int top_le align_variable (decl, dont_output_data); if (flag_asan - && asan_protect_global (decl) - && DECL_ALIGN (decl) < ASAN_RED_ZONE_SIZE * BITS_PER_UNIT) + && asan_protect_global (decl)) { asan_protected = true; - DECL_ALIGN (decl) = ASAN_RED_ZONE_SIZE * BITS_PER_UNIT; + DECL_ALIGN (decl) = MAX (DECL_ALIGN (decl), + ASAN_RED_ZONE_SIZE * BITS_PER_UNIT); } set_mem_align (decl_rtl, DECL_ALIGN (decl)); On Thu, Oct 25, 2012 at 2:46 PM, Jakub Jelinek wrote: > On Thu, Oct 25, 2012 at 02:32:33PM -0700, Wei Mi wrote: >> A small patch to remove the bogus error reports exposed in the >> spec2000 testing. In varasm.c, asan_protected should be equivalent >> with asan_protect_global (decl) all the time, or else compiler will >> not insert redzones for some globals planned to be protected. >> >> gcc/ChangeLog: >> 2012-10-25 Wei Mi >> >> A small fix to remove bogus error report of global buffer overflow. >> * varasm.c: correct the condition of asan_protected being true. > > The patch is almost ok, the ChangeLog entry is not. > Two instead of 3 spaces between date and name, no need for introductory > comment above * varasm.c line, name of modified function and capital > letter after :. > Perhaps > * varasm.c (assemble_variable): Set asan_protected even for decls > that are already ASAN_RED_ZONE_SIZE or more bytes aligned. > > Ok with that Change and: > >> --- varasm.c(revision 192822) >> +++ varasm.c(working copy) >> @@ -1991,11 +1991,10 @@ assemble_variable (tree decl, int top_le >>align_variable (decl, dont_output_data); >> >>if (flag_asan >> - && asan_protect_global (decl) >> - && DECL_ALIGN (decl) < ASAN_RED_ZONE_SIZE * BITS_PER_UNIT) >> + && asan_protect_global (decl)) >> { >>asan_protected = true; >> - DECL_ALIGN (decl) = ASAN_RED_ZONE_SIZE * BITS_PER_UNIT; >> + DECL_ALIGN (decl) = MAX (DECL_ALIGN (decl), ASAN_RED_ZONE_SIZE > * BITS_PER_UNIT); > > Too long line, put ASAN_RED_ZONE_SIZE * BITS_PER_UNIT on next > line below the second DECL_ALIGN (decl). > > Jakub
[tsan] ThreadSanitizer instrumentation part
Hi, The patch is about ThreadSanitizer. ThreadSanitizer is a data race detector for C/C++ programs. It contains two parts: instrumentation and runtime library. This patch is the first part, and runtime will be included in the second part. Dmitry(dvyu...@google.com) is the author of this part, and I try to migrate it to trunk. Ok for trunk? gcc/ChangeLog: 2012-10-31 Wei Mi * Makefile.in (tsan.o): New * passes.c (init_optimization_passes): Add tsan passes * tree-pass.h (register_pass_info): Ditto * cfghooks.h (GCC_CFGHOOKS_H): Avoid including duplicate headers * doc/invoke.texi: Document tsan related options * toplev.c (compile_file): Add tsan pass in driver * gcc.c (LINK_COMMAND_SPEC): Add -lasan in link command if there -ftsan is on. * tsan.c: New file about tsan * tsan.h: Ditto Please check the following links for background: http://code.google.com/p/data-race-test http://gcc.gnu.org/wiki/cauldron2012?action=AttachFile&do=get&target=kcc.pdf (the second half is about ThreadSanitizer). A small testcase race_on_heap.cc is attached to show its functionality. Run the small testcase with -ftsan produce the following warning: WARNING: ThreadSanitizer: data race (pid=5978) Write of size 4 at 0x7d0600039040 by thread 3: #0 Thread2(void*) ??:0 (exe+0x52c0) Previous write of size 4 at 0x7d0600039040 by thread 2: #0 Thread1(void*) ??:0 (exe+0x527d) Location is heap block of size 99 at 0x7d0600039030 allocated by thread 1: #0 malloc /usr/local/google/home/wmi/Work/llvm-main/trunk/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:293 (exe+0xe9ce) #1 alloc() ??:0 (exe+0x52fc) #2 AllocThread(void*) ??:0 (exe+0x532c) Thread 3 (tid=5981, running) created at: #0 pthread_create /usr/local/google/home/wmi/Work/llvm-main/trunk/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:645 (exe+0xbf1d) #1 main ??:0 (exe+0x5433) Thread 2 (tid=5980, finished) created at: #0 pthread_create /usr/local/google/home/wmi/Work/llvm-main/trunk/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:645 (exe+0xbf1d) #1 main ??:0 (exe+0x5400) Thread 1 (tid=5979, finished) created at: #0 pthread_create /usr/local/google/home/wmi/Work/llvm-main/trunk/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:645 (exe+0xbf1d) #1 main ??:0 (exe+0x5384) Thanks, Wei. Index: gcc/Makefile.in === --- gcc/Makefile.in (revision 193016) +++ gcc/Makefile.in (working copy) @@ -1351,6 +1351,7 @@ OBJS = \ trans-mem.o \ tree-affine.o \ tree-call-cdce.o \ + tsan.o \ tree-cfg.o \ tree-cfgcleanup.o \ tree-chrec.o \ @@ -2618,6 +2619,12 @@ tree-nomudflap.o : $(CONFIG_H) $(SYSTEM_ $(C_TREE_H) $(C_COMMON_H) $(GIMPLE_H) $(DIAGNOSTIC_H) $(HASHTAB_H) \ output.h langhooks.h tree-mudflap.h $(TM_H) coretypes.h \ $(GGC_H) gt-tree-mudflap.h $(TREE_PASS_H) $(DIAGNOSTIC_CORE_H) +tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TREE_INLINE_H) \ + $(GIMPLE_H) $(DIAGNOSTIC_H) langhooks.h \ + $(TM_H) coretypes.h $(TREE_DUMP_H) $(TREE_PASS_H) $(CGRAPH_H) $(GGC_H) \ + $(BASIC_BLOCK_H) $(FLAGS_H) $(FUNCTION_H) \ + $(TM_P_H) $(TREE_FLOW_H) $(DIAGNOSTIC_CORE_H) $(GIMPLE_H) tree-iterator.h \ + intl.h cfghooks.h output.h options.h c-family/c-common.h tsan.h tree-pretty-print.o : tree-pretty-print.c $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(DIAGNOSTIC_H) $(HASHTAB_H) $(TREE_FLOW_H) \ $(TM_H) coretypes.h dumpfile.h tree-iterator.h $(SCEV_H) langhooks.h \ @@ -2671,7 +2678,8 @@ toplev.o : toplev.c $(CONFIG_H) $(SYSTEM $(CGRAPH_H) $(COVERAGE_H) alloc-pool.h $(GGC_H) \ $(OPTS_H) params.def tree-mudflap.h $(TREE_PASS_H) $(GIMPLE_H) \ tree-ssa-alias.h $(PLUGIN_H) realmpfr.h tree-diagnostic.h \ - $(TREE_PRETTY_PRINT_H) opts-diagnostic.h $(COMMON_TARGET_H) + $(TREE_PRETTY_PRINT_H) opts-diagnostic.h $(COMMON_TARGET_H) \ + tsan.h hwint.o : hwint.c $(CONFIG_H) $(SYSTEM_H) $(DIAGNOSTIC_CORE_H) Index: gcc/passes.c === --- gcc/passes.c(revision 193016) +++ gcc/passes.c(working copy) @@ -1439,6 +1439,7 @@ init_optimization_passes (void) NEXT_PASS (pass_split_crit_edges); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); + NEXT_PASS (pass_tsan); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = &pass_tree_loop.pass.sub; @@ -1544,6 +1545,7 @@ init_optimization_passes (void) NEXT_PASS (pass_tm_edges); } NEXT_PASS (pass_lower_complex_O0); + NEXT_PASS (pass_tsan_O0); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); Index: gcc/tree-pass.h ===
Re: [PATCH, IRA] Fix ALLOCNO_MODE in the case of paradoxical subreg.
> You removed conditition with LRA_SUBREG for non-paradoxical subreg > generated for matched operands. I think that is important condition and > the comment says why. There are some 32-bit insns constraints requiring > different modes (int and fp ones) for matching operands in FP regs. The > condition prevents LRA cycling as such subreg can look invalid (in > simplify_subreg_regno) but it is used internally in LRA for matching > constraint expression and should be not reloaded. > > With this change the patch is ok for me. > Thank you very much for the review! Patch fixed according to your comments and committed as r203169. Regards, Wei Mi.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Tue, Sep 24, 2013 at 4:32 PM, Wei Mi wrote: >>> It doesn't look right. IP relative address is only possible >>> with TARGET_64BIT and >>> >>> 1. base == pc. Or >>> 2. UUNSPEC_PCREL, UNSPEC_GOTPCREL, and >>> NSPEC_GOTNTPOFF. >> >> Target 64bit should be tested above. We however output RIP addresses >> also for basic symbol references. I.e. when base is an symbol addresss. >> such as in: >> int a; >> int t() >> { >> return a; >> } >> >> memory_address_length already contains logic to figure out if there is IP >> relative addressing going on (I am not sure it is completely accurate >> either). >> Better to break it out to a common predicate and perhaps unify with what >> ix86_print_operand_address is doing. >> >> Honza >>> >>> >>> -- >>> H.J. > > Thanks. How about this one. bootstrap and regression are going on. > Ccing scheduler maintainers. Ping. Repaste the patch with some minor error fixed. bootstrap and regression ok. Ok for trunk? Thanks, Wei Mi. 2013-10-03 Wei Mi * gcc/config/i386/i386.c (memory_address_length): Extract a part of code to rip_relative_addr_p. (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/sched-rgn.c (add_branch_dependences): Keep insns in a SCHED_GROUP at the end of BB to remain their location. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..59b0bcf 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24204,6 +24204,42 @@ ix86_instantiate_decls (void) instantiate_decl_rtl (s->rtl); } +/* Check whether x86 address PARTS is a pc-relative address. */ + +static bool +rip_relative_addr_p (struct ix86_address *parts) +{ + rtx base, index, disp; + + base = parts->base; + index = parts->index; + disp = parts->disp; + + if (disp && !base && !index) +{ + if (TARGET_64BIT) + { + rtx symbol = disp; + + if (GET_CODE (disp) == CONST) + symbol = XEXP (disp, 0); + if (GET_CODE (symbol) == PLUS + && CONST_INT_P (XEXP (symbol, 1))) + symbol = XEXP (symbol, 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + && SYMBOL_REF_TLS_MODEL (symbol) == 0) + || (GET_CODE (symbol) == UNSPEC + && (XINT (symbol, 1) == UNSPEC_GOTPCREL + || XINT (symbol, 1) == UNSPEC_PCREL + || XINT (symbol, 1) == UNSPEC_GOTNTPOFF))) + return true; + } +} + return false; +} + /* Calculate the length of the memory address in the instruction encoding. Includes addr32 prefix, does not include the one-byte modrm, opcode, or other prefixes. We never generate addr32 prefix for LEA insn. */ @@ -24275,25 +24311,8 @@ memory_address_length (rtx addr, bool lea) else if (disp && !base && !index) { len += 4; - if (TARGET_64BIT) - { - rtx symbol = disp; - - if (GET_CODE (disp) == CONST) - symbol = XEXP (disp, 0); - if (GET_CODE (symbol) == PLUS - && CONST_INT_P (XEXP (symbol, 1))) - symbol = XEXP (symbol, 0); - - if (GET_CODE (symbol) != LABEL_REF - && (GET_CODE (symbol) != SYMBOL_REF - || SYMBOL_REF_TLS_MODEL (symbol) != 0) - && (GET_CODE (symbol) != UNSPEC - || (XINT (symbol, 1) != UNSPEC_GOTPCREL - && XINT (symbol, 1) != UNSPEC_PCREL - && XINT (symbol, 1) != UNSPEC_GOTNTPOFF))) - len++; - } + if (rip_relative_addr_p (&parts)) + len++; } else { @@ -24856,6 +24875,122 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusio
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Thanks for the comments. One question inlined. Preparing another patch addressing the comments. Regards, Wei Mi. On Tue, Oct 15, 2013 at 1:35 PM, Jeff Law wrote: > On 10/03/13 12:24, Wei Mi wrote: >> >> Thanks, >> Wei Mi. >> >> 2013-10-03 Wei Mi >> >> * gcc/config/i386/i386.c (memory_address_length): Extract a part >> of code to rip_relative_addr_p. >> (rip_relative_addr_p): New Function. >> (ix86_macro_fusion_p): Ditto. >> (ix86_macro_fusion_pair_p): Ditto. >> * gcc/config/i386/i386.h: Add new tune features about >> macro-fusion. >> * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. >> * gcc/doc/tm.texi: Generated. >> * gcc/doc/tm.texi.in: Ditto. >> * gcc/haifa-sched.c (try_group_insn): New Function. >> (group_insns_for_macro_fusion): Ditto. >> (sched_init): Call group_insns_for_macro_fusion. >> * gcc/sched-rgn.c (add_branch_dependences): Keep insns in >> a SCHED_GROUP at the end of BB to remain their location. >> * gcc/target.def: Add two hooks: macro_fusion_p and >> macro_fusion_pair_p. > > I'm not going to comment on the x86 specific stuff -- I'll defer to the port > maintainers for that. > > > >> index 61eaaef..d6726a9 100644 >> --- a/gcc/haifa-sched.c >> +++ b/gcc/haifa-sched.c >> @@ -6519,6 +6519,44 @@ setup_sched_dump (void) >> ? stderr : dump_file); >> } >> >> +static void >> +try_group_insn (rtx insn) > > You need a comment for this function. > Ok, will add comment for it. > > >> +{ >> + unsigned int condreg1, condreg2; >> + rtx cc_reg_1; >> + rtx prev; >> + >> + targetm.fixed_condition_code_regs (&condreg1, &condreg2); >> + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); >> + prev = prev_nonnote_nondebug_insn (insn); >> + if (!any_condjump_p (insn) >> + || !reg_referenced_p (cc_reg_1, PATTERN (insn)) >> + || !prev >> + || !modified_in_p (cc_reg_1, prev)) >> +return; > > I'd test !any_condjump_p at the start of this function before calling the > target hook. If insn isn't a conditional jump, then all the other work is > totally useless. Ok. will fix it. > > Aren't you just trying to see if we have a comparison feeding the > conditional jump and if they're already adjacent? Do you actually need to > get the condition code regs to do that test? > Yes, I am trying to see if we have a comparison feeding the conditional jump and if they're already adjacent. Do you have more easier way to do that test? > >> + >> + /* Different microarchitectures support macro fusions for different >> + combinations of insn pairs. */ >> + if (!targetm.sched.macro_fusion_pair_p >> + || !targetm.sched.macro_fusion_pair_p (prev, insn)) >> +return; >> + >> + SCHED_GROUP_P (insn) = 1; > > I'm surprised that SCHED_GROUP_P worked -- I've tried to do similar stuff in > the past and ran into numerous problems trying to hijack SCHED_GROUP_P for > this kind of purpose. > > > >> >> static void haifa_init_only_bb (basic_block, basic_block); >> diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c >> index e1a2dce..156359e 100644 >> --- a/gcc/sched-rgn.c >> +++ b/gcc/sched-rgn.c >> @@ -2443,6 +2443,8 @@ add_branch_dependences (rtx head, rtx tail) >>cc0 setters remain at the end because they can't be moved away from >>their cc0 user. >> >> + Predecessors of SCHED_GROUP_P instructions at the end remain at the >> end. >> + >>COND_EXEC insns cannot be moved past a branch (see e.g. PR17808). >> >>Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually >> return >> @@ -2465,7 +2467,8 @@ add_branch_dependences (rtx head, rtx tail) >> #endif >> || (!reload_completed >> && sets_likely_spilled (PATTERN (insn) >> -|| NOTE_P (insn)) >> +|| NOTE_P (insn) >> +|| (last != 0 && SCHED_GROUP_P (last))) >> { >> if (!NOTE_P (insn)) >> { > > This looks like a straighforward bugfix and probably should go forward > independent of this enhancement. Ok, I will separate it into another patch. > > Jeff
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
> Go ahead and consider that pre-approved. Just send it to the list with a > note that I approved it in this thread. > > Jeff Thanks! The new patch addressed Jeff's comments. Is it ok for x86 maintainer? Thanks, Wei Mi. 2013-10-16 Wei Mi * gcc/config/i386/i386.c (memory_address_length): Extract a part of code to rip_relative_addr_p. (rip_relative_addr_p): New Function. (ix86_macro_fusion_p): Ditto. (ix86_macro_fusion_pair_p): Ditto. * gcc/config/i386/i386.h: Add new tune features about macro-fusion. * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. * gcc/doc/tm.texi: Generated. * gcc/doc/tm.texi.in: Ditto. * gcc/haifa-sched.c (try_group_insn): New Function. (group_insns_for_macro_fusion): Ditto. (sched_init): Call group_insns_for_macro_fusion. * gcc/target.def: Add two hooks: macro_fusion_p and macro_fusion_pair_p. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1fd3f60..59b0bcf 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -24204,6 +24204,42 @@ ix86_instantiate_decls (void) instantiate_decl_rtl (s->rtl); } +/* Check whether x86 address PARTS is a pc-relative address. */ + +static bool +rip_relative_addr_p (struct ix86_address *parts) +{ + rtx base, index, disp; + + base = parts->base; + index = parts->index; + disp = parts->disp; + + if (disp && !base && !index) +{ + if (TARGET_64BIT) + { + rtx symbol = disp; + + if (GET_CODE (disp) == CONST) + symbol = XEXP (disp, 0); + if (GET_CODE (symbol) == PLUS + && CONST_INT_P (XEXP (symbol, 1))) + symbol = XEXP (symbol, 0); + + if (GET_CODE (symbol) == LABEL_REF + || (GET_CODE (symbol) == SYMBOL_REF + && SYMBOL_REF_TLS_MODEL (symbol) == 0) + || (GET_CODE (symbol) == UNSPEC + && (XINT (symbol, 1) == UNSPEC_GOTPCREL + || XINT (symbol, 1) == UNSPEC_PCREL + || XINT (symbol, 1) == UNSPEC_GOTNTPOFF))) + return true; + } +} + return false; +} + /* Calculate the length of the memory address in the instruction encoding. Includes addr32 prefix, does not include the one-byte modrm, opcode, or other prefixes. We never generate addr32 prefix for LEA insn. */ @@ -24275,25 +24311,8 @@ memory_address_length (rtx addr, bool lea) else if (disp && !base && !index) { len += 4; - if (TARGET_64BIT) - { - rtx symbol = disp; - - if (GET_CODE (disp) == CONST) - symbol = XEXP (disp, 0); - if (GET_CODE (symbol) == PLUS - && CONST_INT_P (XEXP (symbol, 1))) - symbol = XEXP (symbol, 0); - - if (GET_CODE (symbol) != LABEL_REF - && (GET_CODE (symbol) != SYMBOL_REF - || SYMBOL_REF_TLS_MODEL (symbol) != 0) - && (GET_CODE (symbol) != UNSPEC - || (XINT (symbol, 1) != UNSPEC_GOTPCREL - && XINT (symbol, 1) != UNSPEC_PCREL - && XINT (symbol, 1) != UNSPEC_GOTNTPOFF))) - len++; - } + if (rip_relative_addr_p (&parts)) + len++; } else { @@ -24856,6 +24875,122 @@ ia32_multipass_dfa_lookahead (void) } } +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} + +/* Check whether current microarchitecture support macro fusion + for insn pair "CONDGEN + CONDJMP". Refer to + "Intel Architectures Optimization Reference Manual". */ + +static bool +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) +{ + rtx src, dest; + rtx single_set = single_set (condgen); + enum rtx_code ccode; + rtx compare_set = NULL_RTX, test_if, cond; + rtx alu_set = NULL_RTX, addr = NULL_RTX; + + if (get_attr_type (condgen) != TYPE_TEST + && get_attr_type (condgen) != TYPE_ICMP + && get_attr_type (condgen) != TYPE_INCDEC + && get_attr_type (condgen) != TYPE_ALU) +return false; + + if (single_set == NULL_RTX + && !TARGET_FUSE_ALU_AND_BRANCH) +return false; + + if (single_set != NULL_RTX) +compare_set = single_set; + else +{ + int i; + rtx pat = PATTERN (condgen); + for (i = 0; i < XVECLEN (pat, 0); i++) + if (GET_CODE (XVECEXP (pat, 0, i)) == SET) + { + rtx set_src = SET_SRC (XVECEXP (pat, 0, i)); + if (GET_CODE (set_src) == COMPARE) + compare_set = XVECEXP (pat, 0, i); + else + alu_set = XVECEXP (pat, 0, i); + } +} + if (compare_set == NULL_RTX) +return false; + src = SET_SRC
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Thu, Oct 17, 2013 at 12:35 AM, Marek Polacek wrote: > On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote: >> +/* Return true if target platform supports macro-fusion. */ >> + >> +static bool >> +ix86_macro_fusion_p () >> +{ >> + if (TARGET_FUSE_CMP_AND_BRANCH) >> +return true; >> + else >> +return false; >> +} > > That looks weird, why not just > > static bool > ix86_macro_fusion_p (void) > { > return TARGET_FUSE_CMP_AND_BRANCH; > } > > ? > > Marek Thanks, fixed. Wei Mi.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Ping. Is it ok for x86 maintainer? Thanks, Wei Mi. On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi wrote: >> Go ahead and consider that pre-approved. Just send it to the list with a >> note that I approved it in this thread. >> >> Jeff > > Thanks! The new patch addressed Jeff's comments. > > Is it ok for x86 maintainer? > > Thanks, > Wei Mi. > > 2013-10-16 Wei Mi > > * gcc/config/i386/i386.c (memory_address_length): Extract a part > of code to rip_relative_addr_p. > (rip_relative_addr_p): New Function. > (ix86_macro_fusion_p): Ditto. > (ix86_macro_fusion_pair_p): Ditto. > * gcc/config/i386/i386.h: Add new tune features about macro-fusion. > * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. > * gcc/doc/tm.texi: Generated. > * gcc/doc/tm.texi.in: Ditto. > * gcc/haifa-sched.c (try_group_insn): New Function. > (group_insns_for_macro_fusion): Ditto. > (sched_init): Call group_insns_for_macro_fusion. > * gcc/target.def: Add two hooks: macro_fusion_p and > macro_fusion_pair_p. > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 1fd3f60..59b0bcf 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -24204,6 +24204,42 @@ ix86_instantiate_decls (void) >instantiate_decl_rtl (s->rtl); > } > > +/* Check whether x86 address PARTS is a pc-relative address. */ > + > +static bool > +rip_relative_addr_p (struct ix86_address *parts) > +{ > + rtx base, index, disp; > + > + base = parts->base; > + index = parts->index; > + disp = parts->disp; > + > + if (disp && !base && !index) > +{ > + if (TARGET_64BIT) > + { > + rtx symbol = disp; > + > + if (GET_CODE (disp) == CONST) > + symbol = XEXP (disp, 0); > + if (GET_CODE (symbol) == PLUS > + && CONST_INT_P (XEXP (symbol, 1))) > + symbol = XEXP (symbol, 0); > + > + if (GET_CODE (symbol) == LABEL_REF > + || (GET_CODE (symbol) == SYMBOL_REF > + && SYMBOL_REF_TLS_MODEL (symbol) == 0) > + || (GET_CODE (symbol) == UNSPEC > + && (XINT (symbol, 1) == UNSPEC_GOTPCREL > + || XINT (symbol, 1) == UNSPEC_PCREL > + || XINT (symbol, 1) == UNSPEC_GOTNTPOFF))) > + return true; > + } > +} > + return false; > +} > + > /* Calculate the length of the memory address in the instruction encoding. > Includes addr32 prefix, does not include the one-byte modrm, opcode, > or other prefixes. We never generate addr32 prefix for LEA insn. */ > @@ -24275,25 +24311,8 @@ memory_address_length (rtx addr, bool lea) >else if (disp && !base && !index) > { >len += 4; > - if (TARGET_64BIT) > - { > - rtx symbol = disp; > - > - if (GET_CODE (disp) == CONST) > - symbol = XEXP (disp, 0); > - if (GET_CODE (symbol) == PLUS > - && CONST_INT_P (XEXP (symbol, 1))) > - symbol = XEXP (symbol, 0); > - > - if (GET_CODE (symbol) != LABEL_REF > - && (GET_CODE (symbol) != SYMBOL_REF > - || SYMBOL_REF_TLS_MODEL (symbol) != 0) > - && (GET_CODE (symbol) != UNSPEC > - || (XINT (symbol, 1) != UNSPEC_GOTPCREL > - && XINT (symbol, 1) != UNSPEC_PCREL > - && XINT (symbol, 1) != UNSPEC_GOTNTPOFF))) > - len++; > - } > + if (rip_relative_addr_p (&parts)) > + len++; > } >else > { > @@ -24856,6 +24875,122 @@ ia32_multipass_dfa_lookahead (void) > } > } > > +/* Return true if target platform supports macro-fusion. */ > + > +static bool > +ix86_macro_fusion_p () > +{ > + if (TARGET_FUSE_CMP_AND_BRANCH) > +return true; > + else > +return false; > +} > + > +/* Check whether current microarchitecture support macro fusion > + for insn pair "CONDGEN + CONDJMP". Refer to > + "Intel Architectures Optimization Reference Manual". */ > + > +static bool > +ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) > +{ > + rtx src, dest; > + rtx single_set = single_set (condgen); > + enum rtx_code ccode; > + rtx compare_set = NULL_RTX, test_if, cond; > + rtx alu_set = NULL_RTX, addr = NULL_RTX; > + > + if (get_attr_type (condgen) != TYPE_TEST > + && get_attr_type (condgen) != TYPE_ICMP > +
[PATCH] PR58985: testcase error.
Hi, This is to fix testcase error reported in PR58985. The intention of the testcase was to ensure there was no REG_EQUIV notes generated for a reg which was used in a paradoxical subreg. When target was x86, there was subreg generated so I omitted to add the subreg in the regexp pattern. However there is no subreg generated for target cris-axis-elf, so REG_EQUIV should be allowed. Is it ok for trunk and gcc-4.8 branch? Thanks, Wei Mi. 2013-11-04 Wei Mi PR regression/58985 * testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern. Index: testsuite/gcc.dg/pr57518.c === --- testsuite/gcc.dg/pr57518.c (revision 204353) +++ testsuite/gcc.dg/pr57518.c (working copy) @@ -2,7 +2,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-ira" } */ -/* { dg-final { scan-rtl-dump-not "REG_EQUIV\[^\n\]*mem\[^\n\]*\"ip\"" "ira" } } */ +/* { dg-final { scan-rtl-dump-not "REG_EQUIV\[^\n\]*mem\[^\n\]*\"ip\".*subreg" "ira" } } */ char ip[10]; int total;
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Thanks! The three patches are commited as r204367, r204369 and r204371. Regards, Wei Mi. On Sun, Nov 3, 2013 at 5:18 PM, Jan Hubicka wrote: >> Ping. Is it ok for x86 maintainer? > > I tought I already approved the x86 bits. >> >> Thanks, >> Wei Mi. >> >> On Wed, Oct 16, 2013 at 4:25 PM, Wei Mi wrote: >> >> Go ahead and consider that pre-approved. Just send it to the list with a >> >> note that I approved it in this thread. >> >> >> >> Jeff >> > >> > Thanks! The new patch addressed Jeff's comments. >> > >> > Is it ok for x86 maintainer? >> > >> > Thanks, >> > Wei Mi. >> > >> > 2013-10-16 Wei Mi >> > >> > * gcc/config/i386/i386.c (memory_address_length): Extract a part >> > of code to rip_relative_addr_p. >> > (rip_relative_addr_p): New Function. >> > (ix86_macro_fusion_p): Ditto. >> > (ix86_macro_fusion_pair_p): Ditto. >> > * gcc/config/i386/i386.h: Add new tune features about macro-fusion. >> > * gcc/config/i386/x86-tune.def (DEF_TUNE): Ditto. >> > * gcc/doc/tm.texi: Generated. >> > * gcc/doc/tm.texi.in: Ditto. >> > * gcc/haifa-sched.c (try_group_insn): New Function. >> > (group_insns_for_macro_fusion): Ditto. >> > (sched_init): Call group_insns_for_macro_fusion. >> > * gcc/target.def: Add two hooks: macro_fusion_p and >> > macro_fusion_pair_p. >> > > The i386 bits are OK. > > Honza
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Tue, Nov 26, 2013 at 9:34 PM, Jeff Law wrote: > On 11/26/13 12:33, Wei Mi wrote: >> >> On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote: >>> >>> >>>> >>>> Doing the cleanup at the end of BB could ensure all the groups >>>> inserted for macrofusion will be cleaned. For groups not at the end of >>>> a block, no matter whether they are cleaned up or not, nothing will >>>> happen because other passes will not mess up those groups -- you said >>>> cc0-setter/cc0-user was such a case. Is it call group a different >>>> case? >>> >>> >>> True, it would be safe, but it seems inconsistent and confusing that some >>> SCHED_GROUP_P references would be purged and others remain. >>> >>> Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we >>> should >>> be wiping it as we leave and that our RTL checkers ought to be verifying >>> there are no insns with SCHED_GROUP_P left on. >>> >> >> How about add a verifier TODO_verify_sched_group_flag similar as >> TODO_verify_rtl_sharing, and add the verifier in the todo lists of all >> the scheduling passes. >> >>> >>>> >>>> For sched1 and sched2, we can do that. Actually, I find it has been >>>> done in move_insn when commit_schedule. But for modulo scheduling, I >>>> havn't found a good place to do it. >>> >>> >>> Well, that's where I'd suggest focusing attention. >>> >>> jeff >>> >> >> After looking it carefully, even for sched1 and sched2, it is not ok >> to depend on move_insn in commit_schedule to clean up all the >> SCHED_GROUP_P, suppose a block is decided not to be scheduled by >> dbg_cnt, then SCHED_GROUP_P inside the block will not be cleaned. >> It is even more difficult to find a place inside SMS scheduling to do >> the cleanup. >> >> Any suggestions? > > Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P > for each insn at the start of this loop in sched_analyze? > > It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler, > but it might be an option. > >for (insn = head;; insn = NEXT_INSN (insn)) > { > > if (INSN_P (insn)) > { > /* And initialize deps_lists. */ > sd_init_insn (insn); > } > > deps_analyze_insn (deps, insn); > > if (insn == tail) > { > if (sched_deps_info->use_cselib) > cselib_finish (); > return; > } > } > Jeff >> >> > Thanks for the suggestion. It looks workable. Then I need to move the SCHED_GROUP_P setting for macrofusion from sched_init to a place inside sched_analyze after the SCHED_GROUP_P cleanup. It will be more consistent with the settings for cc0 setter-user group and call group, which are both inside sched_analyze. I am trying this method... Thanks, Wei.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
>> Hmm, maybe attack from the other direction? -- could we clear SCHED_GROUP_P >> for each insn at the start of this loop in sched_analyze? >> >> It's not as clean in the sense that SCHED_GROUP_P "escapes" the scheduler, >> but it might be an option. >> >>for (insn = head;; insn = NEXT_INSN (insn)) >> { >> >> if (INSN_P (insn)) >> { >> /* And initialize deps_lists. */ >> sd_init_insn (insn); >> } >> >> deps_analyze_insn (deps, insn); >> >> if (insn == tail) >> { >> if (sched_deps_info->use_cselib) >> cselib_finish (); >> return; >> } >> } >> Jeff >>> >>> >> > > Thanks for the suggestion. It looks workable. Then I need to move the > SCHED_GROUP_P setting for macrofusion from sched_init to a place > inside sched_analyze after the SCHED_GROUP_P cleanup. It will be more > consistent with the settings for cc0 setter-user group and call group, > which are both inside sched_analyze. > I am trying this method... > > Thanks, > Wei. Here is the patch. The patch does the SCHED_GROUP_P cleanup in sched_analyze before deps_analyze_insn set SCHED_GROUP_P and chain the insn with prev insns. And it move try_group_insn for macrofusion from sched_init to sched_analyze_insn. bootstrap and regression pass on x86_64-linux-gnu. Is it ok? Thanks, Wei. 2013-11-27 Wei Mi PR rtl-optimization/59020 * sched-deps.c (try_group_insn): Move it from haifa-sched.c to here. (sched_analyze_insn): Call try_group_insn. (sched_analyze): Cleanup SCHED_GROUP_P before start the analysis. * haifa-sched.c (try_group_insn): Moved to sched-deps.c. (group_insns_for_macro_fusion): Removed. (sched_init): Remove calling group_insns_for_macro_fusion. 2013-11-27 Wei Mi PR rtl-optimization/59020 * testsuite/gcc.dg/pr59020.c: New. * testsuite/gcc.dg/macro-fusion-1.c: New. * testsuite/gcc.dg/macro-fusion-2.c: New. Index: sched-deps.c === --- sched-deps.c(revision 204923) +++ sched-deps.c(working copy) @@ -2820,6 +2820,37 @@ sched_analyze_2 (struct deps_desc *deps, sched_deps_info->finish_rhs (); } +/* Try to group comparison and the following conditional jump INSN if + they're already adjacent. This is to prevent scheduler from scheduling + them apart. */ + +static void +try_group_insn (rtx insn) +{ + unsigned int condreg1, condreg2; + rtx cc_reg_1; + rtx prev; + + if (!any_condjump_p (insn)) +return; + + targetm.fixed_condition_code_regs (&condreg1, &condreg2); + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); + prev = prev_nonnote_nondebug_insn (insn); + if (!reg_referenced_p (cc_reg_1, PATTERN (insn)) + || !prev + || !modified_in_p (cc_reg_1, prev)) +return; + + /* Different microarchitectures support macro fusions for different + combinations of insn pairs. */ + if (!targetm.sched.macro_fusion_pair_p + || !targetm.sched.macro_fusion_pair_p (prev, insn)) +return; + + SCHED_GROUP_P (insn) = 1; +} + /* Analyze an INSN with pattern X to find all dependencies. */ static void sched_analyze_insn (struct deps_desc *deps, rtx x, rtx insn) @@ -2843,6 +2874,11 @@ sched_analyze_insn (struct deps_desc *de can_start_lhs_rhs_p = (NONJUMP_INSN_P (insn) && code == SET); + /* Group compare and branch insns for macro-fusion. */ + if (targetm.sched.macro_fusion_p + && targetm.sched.macro_fusion_p ()) +try_group_insn (insn); + if (may_trap_p (x)) /* Avoid moving trapping instructions across function calls that might not always return. */ @@ -3733,6 +3769,10 @@ sched_analyze (struct deps_desc *deps, r { /* And initialize deps_lists. */ sd_init_insn (insn); + /* Clean up SCHED_GROUP_P which may have been set by last +scheduler pass. */ + if (SCHED_GROUP_P (insn)) + SCHED_GROUP_P (insn) = 0; } deps_analyze_insn (deps, insn); Index: haifa-sched.c === --- haifa-sched.c (revision 204923) +++ haifa-sched.c (working copy) @@ -6554,50 +6554,6 @@ setup_sched_dump (void) ? stderr : dump_file); } -/* Try to group comparison and the following conditional jump INSN if - they're already adjacent. This is to prevent scheduler from scheduling - them apart. */ - -static void -try_group_insn (rtx insn) -{ - unsigned int condreg1, condreg2; - rtx cc_reg_1; - rtx prev; - - if (!any_condjump_p (insn)) -return; - - targetm.fixed_condition_code_re
Re: [PATCH] Builtins handling in IVOPT
Ping. Thanks, wei. On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote: > bootstrap and regression of the updated patch pass. > > On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote: >> On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak >> wrote: >>> Hi, >>> >>>> This patch works on the intrinsic calls handling issue in IVOPT mentioned >>>> here: >>>> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >>>> >>>> In find_interesting_uses_stmt, it changes >>>> >>>> arg = expr >>>> __builtin_xxx (arg) >>>> >>>> to >>>> >>>> arg = expr; >>>> tmp = addr_expr (mem_ref(arg)); >>>> __builtin_xxx (tmp, ...) >>> >>> this looks a bit confusing (and wasteful) to me. It would make more sense to >>> just record the argument as USE_ADDRESS and do the rewriting in >>> rewrite_use_address. >>> >>> Zdenek >> >> I updated the patch. The gimple changing part is now moved to >> rewrite_use_address. Add support for plain address expr in addition to >> reference expr in find_interesting_uses_address. >> >> bootstrap and testing is going on. >> >> 2013-11-22 Wei Mi >> >> * expr.c (expand_expr_addr_expr_1): Not to split TMR. >> (expand_expr_real_1): Ditto. >> * targhooks.c (default_builtin_has_mem_ref_p): Default >> builtin. >> * tree-ssa-loop-ivopts.c (builtin_has_mem_ref_p): New function. >> (rewrite_use_address): Add TMR for builtin. >> (find_interesting_uses_stmt): Special handling of builtins. >> * gimple-expr.c (is_gimple_address): Add handling of TMR. >> * gimple-expr.h (is_gimple_addressable): Ditto. >> * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. >> (ix86_atomic_assign_expand_fenv): Ditto. >> (ix86_expand_special_args_builtin): Special handling of TMR for >> builtin. >> * target.def (builtin_has_mem_ref_p): New hook. >> * doc/tm.texi.in: Ditto. >> * doc/tm.texi: Generated. >> >> 2013-11-22 Wei Mi >> >> * gcc.dg/tree-ssa/ivopt_5.c: New test. >> >> Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c >> === >> --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) >> +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ >> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */ >> + >> +/* Make sure only one iv is selected after IVOPT. */ >> + >> +#include >> +extern __m128i arr[], d[]; >> +void test (void) >> +{ >> +unsigned int b; >> +for (b = 0; b < 1000; b += 2) { >> + __m128i *p = (__m128i *)(&d[b]); >> + __m128i a = _mm_load_si128(&arr[4*b+3]); >> + __m128i v = _mm_loadu_si128(p); >> + v = _mm_xor_si128(v, a); >> + _mm_storeu_si128(p, v); >> +} >> +} >> + >> +/* { dg-final { scan-tree-dump-times "PHI > +/* { dg-final { cleanup-tree-dump "ivopts" } } */ >> Index: targhooks.c >> === >> --- targhooks.c (revision 204792) >> +++ targhooks.c (working copy) >> @@ -566,6 +566,13 @@ default_builtin_reciprocal (unsigned int >> } >> >> bool >> +default_builtin_has_mem_ref_p (int built_in_function ATTRIBUTE_UNUSED, >> + int i ATTRIBUTE_UNUSED) >> +{ >> + return false; >> +} >> + >> +bool >> hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false ( >> cumulative_args_t ca ATTRIBUTE_UNUSED, >> enum machine_mode mode ATTRIBUTE_UNUSED, >> Index: expr.c >> === >> --- expr.c (revision 204792) >> +++ expr.c (working copy) >> @@ -7467,7 +7467,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t >> tem = fold_build_pointer_plus (tem, TREE_OPERAND (exp, 1)); >> return expand_expr (tem, target, tmode, modifier); >>} >> +case TARGET_MEM_REF: >> + { >> + int old_cse_not_expected; >> + addr_space_t as >> + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0; >> >> + result = addr_for_mem_ref (exp, as, true); >> +
Re: [PATCH] Builtins handling in IVOPT
Ping. Thanks, Wei. On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi wrote: > Ping. > > Thanks, > wei. > > On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote: >> bootstrap and regression of the updated patch pass. >> >> On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote: >>> On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak >>> wrote: >>>> Hi, >>>> >>>>> This patch works on the intrinsic calls handling issue in IVOPT mentioned >>>>> here: >>>>> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >>>>> >>>>> In find_interesting_uses_stmt, it changes >>>>> >>>>> arg = expr >>>>> __builtin_xxx (arg) >>>>> >>>>> to >>>>> >>>>> arg = expr; >>>>> tmp = addr_expr (mem_ref(arg)); >>>>> __builtin_xxx (tmp, ...) >>>> >>>> this looks a bit confusing (and wasteful) to me. It would make more sense >>>> to >>>> just record the argument as USE_ADDRESS and do the rewriting in >>>> rewrite_use_address. >>>> >>>> Zdenek >>> >>> I updated the patch. The gimple changing part is now moved to >>> rewrite_use_address. Add support for plain address expr in addition to >>> reference expr in find_interesting_uses_address. >>> >>> bootstrap and testing is going on. >>> >>> 2013-11-22 Wei Mi >>> >>> * expr.c (expand_expr_addr_expr_1): Not to split TMR. >>> (expand_expr_real_1): Ditto. >>> * targhooks.c (default_builtin_has_mem_ref_p): Default >>> builtin. >>> * tree-ssa-loop-ivopts.c (builtin_has_mem_ref_p): New function. >>> (rewrite_use_address): Add TMR for builtin. >>> (find_interesting_uses_stmt): Special handling of builtins. >>> * gimple-expr.c (is_gimple_address): Add handling of TMR. >>> * gimple-expr.h (is_gimple_addressable): Ditto. >>> * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. >>> (ix86_atomic_assign_expand_fenv): Ditto. >>> (ix86_expand_special_args_builtin): Special handling of TMR for >>> builtin. >>> * target.def (builtin_has_mem_ref_p): New hook. >>> * doc/tm.texi.in: Ditto. >>> * doc/tm.texi: Generated. >>> >>> 2013-11-22 Wei Mi >>> >>> * gcc.dg/tree-ssa/ivopt_5.c: New test. >>> >>> Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c >>> === >>> --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) >>> +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) >>> @@ -0,0 +1,21 @@ >>> +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ >>> +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */ >>> + >>> +/* Make sure only one iv is selected after IVOPT. */ >>> + >>> +#include >>> +extern __m128i arr[], d[]; >>> +void test (void) >>> +{ >>> +unsigned int b; >>> +for (b = 0; b < 1000; b += 2) { >>> + __m128i *p = (__m128i *)(&d[b]); >>> + __m128i a = _mm_load_si128(&arr[4*b+3]); >>> + __m128i v = _mm_loadu_si128(p); >>> + v = _mm_xor_si128(v, a); >>> + _mm_storeu_si128(p, v); >>> +} >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "PHI >> +/* { dg-final { cleanup-tree-dump "ivopts" } } */ >>> Index: targhooks.c >>> === >>> --- targhooks.c (revision 204792) >>> +++ targhooks.c (working copy) >>> @@ -566,6 +566,13 @@ default_builtin_reciprocal (unsigned int >>> } >>> >>> bool >>> +default_builtin_has_mem_ref_p (int built_in_function ATTRIBUTE_UNUSED, >>> + int i ATTRIBUTE_UNUSED) >>> +{ >>> + return false; >>> +} >>> + >>> +bool >>> hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false ( >>> cumulative_args_t ca ATTRIBUTE_UNUSED, >>> enum machine_mode mode ATTRIBUTE_UNUSED, >>> Index: expr.c >>> === >>> --- expr.c (revision 204792) >>> +++ expr.c (working copy) >>>
[google][gcc-4_9] update hardreg costs only when conflict_costs[] < 0
Hi, The patch is for google branch only. In r216697, when a hardreg is assigned to an allocno, a positive cost will be added to those conflict allocnos to reflect the disfavor of the hardreg. However, the fact that conflict allocno disfavors a hard_regno doesn't necessarily mean current allocno should prefer the hard_regno, so it is incorrect to update the costs of an allocno directly according to its conflict allocnos. The patch changes the code to update costs[i] of an allocno only when conflict_costs[i] < 0, .i.e, when conflict allocno prefer hardreg i. Another issue is the costs of an allocno is updated only when the conflict allocno is not marked as may_be_spilled_p. However, even if a conflict allocno is marked as may_be_spilled_p right now, it still has high probablity to get colored later. It is not right to ignore the preferences from those conflict allocnos marked as may_be_spilled_p. The patch changes it. Test: Gcc unit tests ok. Minor improvement for google internal benchmarks. Thanks, Wei. gcc/ChangeLog: 2015-12-10 Wei Mi * ira-color.c (restore_costs_from_conflicts): Don't record the cost change. (update_conflict_hard_regno_costs): Update costs[i] only when conflict_costs[i] < 0. (assign_hard_reg): Ditto. Index: gcc/ira-color.c === --- gcc/ira-color.c (revision 231143) +++ gcc/ira-color.c (working copy) @@ -1588,9 +1588,11 @@ restore_costs_from_conflicts (ira_allocn prev = curr; } /* Propagate the disfavor of hardreg from conflict_a to the -allocnos connecting with conflict_a via copies. */ +allocnos connecting with conflict_a via copies. +Note: once the hardreg is assigned to a, it will not be +changed, so we don't need to record this change. */ update_costs_from_allocno (conflict_a, hardreg, -1, false, true, true); +1, false, false, true); } } } @@ -1601,7 +1603,7 @@ restore_costs_from_conflicts (ira_allocn update increases chances to remove some copies. */ static void update_conflict_hard_regno_costs (int *costs, enum reg_class aclass, - bool decr_p) + bool decr_p, bool conflict) { int i, cost, class_size, freq, mult, div, divisor; int index, hard_regno; @@ -1682,7 +1684,16 @@ update_conflict_hard_regno_costs (int *c cont_p = true; if (decr_p) cost = -cost; - costs[index] += cost; + /* conflict being true indicates this is updating costs[] + according to preferences of allocnos connected by copies + to the conflict allocnos. + The fact conflict allocno disfavors hard_regno doesn't + necessarily mean current allocno should prefer hard_regno + (actually only a little), so we update costs[] only + when conflict allocno prefers hard_regno, .i.e, when + conflict_costs[i] < 0. */ + if (conflict && conflict_costs [i] < 0) + costs[index] += cost; } } /* Probably 5 hops will be enough. */ @@ -1934,7 +1945,6 @@ assign_hard_reg (ira_allocno_t a, bool r } } else if (! retry_p - && ! ALLOCNO_COLOR_DATA (conflict_a)->may_be_spilled_p /* Don't process the conflict allocno twice. */ && (ALLOCNO_COLOR_DATA (conflict_a)->last_process != curr_allocno_process)) @@ -1967,7 +1977,13 @@ assign_hard_reg (ira_allocno_t a, bool r ->new_conflict_hard_regs, hard_regno)) continue; - full_costs[j] -= conflict_costs[k]; + /* The fact conflict_a disfavors hard_regno doesn't + necessarily mean current allocno should prefer + hard_regno so much (only a little), so we only + update full_costs[] when conflict_a prefers + hard_regno, .i.e, when conflict_costs[k] < 0. */ + if (conflict_costs[k] < 0) + full_costs[j] -= conflict_costs[k]; } queue_update_cost (conflict_a, NULL, COST_HOP_DIVISOR); @@ -1977,7 +1993,7 @@ assign_hard_reg (ira_allocno_t a, bool r if (! retry_p) /* Take into account preferences of allocnos connected by copies to the conflict allocnos. */ -update_conflict_hard_regno_costs (full_costs, aclass, true); +update_conflict_hard_regno_costs (full_costs, a
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
Yes, that is intentional. It is to avoid assiging a discriminator for the first call in the group of calls with the same source lineno. Starting from the second call in the group, it will get a different discriminator with previous call in the same group. Thanks, Wei. On Thu, Aug 7, 2014 at 12:17 PM, Cary Coutant wrote: >> static int >> -next_discriminator_for_locus (location_t locus) >> +increase_discriminator_for_locus (location_t locus, bool return_next) >> { >>struct locus_discrim_map item; >>struct locus_discrim_map **slot; >> @@ -934,8 +936,10 @@ next_discriminator_for_locus (location_t >>(*slot)->locus = locus; >>(*slot)->discriminator = 0; >> } >> + >>(*slot)->discriminator++; >> - return (*slot)->discriminator; >> + return return_next ? (*slot)->discriminator >> +: (*slot)->discriminator - 1; >> } > > Won't this have the effect of sometimes incrementing the next > available discriminator without actually using the new value? That is, > if you call it once with return_next == false, and then with > return_next == true. > > -cary
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
No, it is not. This IR is dumped before early inline -- just after pass_build_cfg. The line number of the deconstructor is marked according to where its constructor is located, instead of where it is inserted. This is also problematic. Wei. On Thu, Aug 7, 2014 at 12:11 PM, Xinliang David Li wrote: > Is this > > [1.cc : 179:64] Reader::~Reader (&version); > > from an inline instance? > > David > > On Wed, Aug 6, 2014 at 10:18 AM, Wei Mi wrote: >> We saw bb like this in the IR dump after pass_build_cfg: >> >> : >> [1.cc : 205:45] D.332088 = table->_vptr.Table; >> [1.cc : 205:45] D.332134 = D.332088 + 104; >> [1.cc : 205:45] D.332135 = [1.cc : 205] *D.332134; >> [1.cc : 205:45] D.332092 = [1.cc : 205] &this->cp_stream_; >> [1.cc : 205:46] OBJ_TYPE_REF(D.332135;(struct Table)table->13) (table, >> cp_arg, D.332092); // indirect call >> [1.cc : 179:64] Reader::~Reader (&version); >> [1.cc : 205:46] Switcher::~Switcher (&tcswr); >> >> The indirect call above has the same source lineno with "Switcher::~Switcher >> (&tcswr);", but they have no discriminator so they cannot be discriminated >> in autofdo. This causes the problem that autofdo mistakenly regards >> "Switcher::~Switcher (&tcswr);" as a target of the indirect call above, and >> makes a wrong promotion. >> >> The existing code has the logic to assign different discriminators to calls >> with the same lineno, but it only works when the lineno in a bb is >> monotonical. In this case, there is another stmt with lineno 179 between the >> indirect call and "Switcher::~Switcher (&tcswr);" (both with lineno 205), so >> existing code will not assign different discriminators for them. >> >> The patch is to assign discriminators for calls with the same lineno anyway. >> >> regression test is going. internal perf test for autofdo shows a little >> improvement. Ok for google-4_9 if regression pass? >> >> Thanks, >> Wei. >> >> ChangeLog: >> >> 2014-08-06 Wei Mi >> >> * tree-cfg.c (increase_discriminator_for_locus): It was >> next_discriminator_for_locus. Add a param "return_next". >> (next_discriminator_for_locus): Renamed. >> (assign_discriminator): Use the renamed func. >> (assign_discriminators): Assign different discriminators >> for calls with the same lineno. >> >> >> Index: tree-cfg.c >> === >> --- tree-cfg.c (revision 213402) >> +++ tree-cfg.c (working copy) >> @@ -914,10 +914,12 @@ make_edges (void) >> /* Find the next available discriminator value for LOCUS. The >> discriminator distinguishes among several basic blocks that >> share a common locus, allowing for more accurate sample-based >> - profiling. */ >> + profiling. If RETURN_NEXT is true, return the discriminator >> + value after the increase, else return the discriminator value >> + before the increase. */ >> >> static int >> -next_discriminator_for_locus (location_t locus) >> +increase_discriminator_for_locus (location_t locus, bool return_next) >> { >>struct locus_discrim_map item; >>struct locus_discrim_map **slot; >> @@ -934,8 +936,10 @@ next_discriminator_for_locus (location_t >>(*slot)->locus = locus; >>(*slot)->discriminator = 0; >> } >> + >>(*slot)->discriminator++; >> - return (*slot)->discriminator; >> + return return_next ? (*slot)->discriminator >> +: (*slot)->discriminator - 1; >> } >> >> /* Return TRUE if LOCUS1 and LOCUS2 refer to the same source line. */ >> @@ -974,7 +978,7 @@ assign_discriminator (location_t locus, >>if (locus == UNKNOWN_LOCATION) >> return; >> >> - discriminator = next_discriminator_for_locus (locus); >> + discriminator = increase_discriminator_for_locus (locus, true); >> >>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> @@ -1009,23 +1013,16 @@ assign_discriminators (void) >>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> gimple stmt = gsi_stmt (gsi); >> - if (curr_locus == UNKNOWN_LOCATION) >> - { >> - curr_locus = gimple_location (stmt); >> - } >> - else if (!same_line_p (curr_locus, gimple_location (stmt))) >
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
On Thu, Aug 7, 2014 at 2:40 PM, Xinliang David Li wrote: > On Thu, Aug 7, 2014 at 2:20 PM, Wei Mi wrote: >> No, it is not. This IR is dumped before early inline -- just after >> pass_build_cfg. The line number of the deconstructor is marked >> according to where its constructor is located, > > The definition location or the invocation location? > > David > The definition location and the invocation location are the same source line for the case. Wei.
Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
Ping. On Sun, Jul 27, 2014 at 11:08 PM, Wei Mi wrote: >> But fact is that it is _not_ necessary to split the block because there >> are no outgoing abnormal edges from it. >> >> The verifier failure is an artifact from using the same predicates during >> CFG building and CFG verifying (usually ok, but for this particular >> case it leads to this issue). >> >> So I don't think your patch is the proper way to address this issue >> (while it certainly works). >> >> Instead whether a call can make abnormal gotos should be recorded >> per-call and stored on the gimple-call. Martin - this is exactly >> one of the cases your patch would address? >> > > Thanks for the comment and thanks to Martin's patch. I try the patch. > It works well to address both pr60449 and pr61776 after some > extension. One extension is to replace GF_CALL_LEAF attribute using > GF_CALL_NO_ABNORMAL_GOTO. That is because not only dropping "leaf" > attribute in lto symbol merge could introduce the control flow > verification problem in pr60449, dropping "const/pure" attributes > could introduce the same problem too. It is unnecessary to introduce > per-call attributes for all these three: ECF_LEAF/ECF_CONST/ECF_PURE, > so GF_CALL_NO_ABNORMAL_GOTO is introduced to indicate that a call stmt > has no abnormal goto. > > GF_CALL_NO_ABNORMAL_GOTO will be set according to gimple_call_flags() > once gimple call stmt is created, then updated in execute_fixup_cfg > and cleanup_tree_cfg. > > I posted the extended patch here. I didn't add the noreturn part in > because it has no direct impact on pr60449 and pr61776. I can help > Martin to test and post that part as an independent patch later. > > bootstrap and regression pass on x86_64-linux-gnu. Is it ok? > > Thanks, > Wei.
Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
Sorry for the late reply. I took some time to make myself more familiar with NORETURN and related code, and finally I understood what you mean and saw why only GF_CALL_CTRL_ALTERING was enough and GF_CALL_NORETURN was unneeded. With your suggestion, the change looks much briefer! Please check if the new patch attached is ok. bootstrap and regression tests pass on x86_64-linux-gnu. Thanks, Wei. > +static void > +update_no_abnormal_goto_attr (basic_block bb) > +{ > + gimple_stmt_iterator gsi; > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > +{ > > it should be enough to check these on last stmts of a basic block, no? Yes, that is better. > > That you call update_no_abnormal_goto_attr from two places > before cleanup_tree_cfg_bb suggests you may want to perform > this change in cleanup_control_flow_bb which already looks > at the last stmt only? Changed. > > Btw, I originally had this whole idea of moving flags to the gimple > stmt level because of the "interesting" way we handle the noreturn > attribute (calls to noreturn functions also end basic-blocks). > > Thus would it be possible to turn all these into a single flag, > GF_CALL_CTRL_ALTERING? That is, cover everything > that is_ctrl_altering_stmt covers? I suggest we initialize it at > CFG build time and only ever clear it later. Good idea! ChangeLog: 2014-08-19 Martin Jambor Wei Mi PR ipa/60449 PR middle-end/61776 * tree-ssa-operands.c (update_stmt_operands): Remove MODIFIED_NORETURN_CALLS. * tree-cfgcleanup.c (cleanup_call_ctrl_altering_flag): New func. (cleanup_control_flow_bb): Use cleanup_call_ctrl_altering_flag. (split_bb_on_noreturn_calls): Renamed from split_bbs_on_noreturn_calls. (cleanup_tree_cfg_1): Use split_bb_on_noreturn_calls. * tree-ssanames.h: Remove MODIFIED_NORETURN_CALLS. * gimple.h (enum gf_mask): Add GF_CALL_CTRL_ALTERING. (gimple_call_set_ctrl_altering): New func. (gimple_call_ctrl_altering_p): Ditto. * tree-cfg.c (gimple_call_initialize_ctrl_altering): Ditto. (make_blocks): Use gimple_call_initialize_ctrl_altering. (is_ctrl_altering_stmt): Use gimple_call_ctrl_altering_p. (execute_fixup_cfg): Use gimple_call_ctrl_altering_p and remove MODIFIED_NORETURN_CALLS. 2014-08-19 Martin Jambor Wei Mi PR ipa/60449 PR middle-end/61776 * testsuite/gcc.dg/lto/pr60449_1.c: New test. * testsuite/gcc.dg/lto/pr60449_0.c: New test. * testsuite/gcc.dg/pr61776.c: New test. Index: tree-ssa-operands.c === --- tree-ssa-operands.c (revision 212442) +++ tree-ssa-operands.c (working copy) @@ -1087,12 +1087,6 @@ update_stmt_operands (struct function *f timevar_push (TV_TREE_OPS); - /* If the stmt is a noreturn call queue it to be processed by - split_bbs_on_noreturn_calls during cfg cleanup. */ - if (is_gimple_call (stmt) - && gimple_call_noreturn_p (stmt)) -vec_safe_push (MODIFIED_NORETURN_CALLS (fn), stmt); - gcc_assert (gimple_modified_p (stmt)); build_ssa_operands (fn, stmt); gimple_set_modified (stmt, false); Index: tree-cfgcleanup.c === --- tree-cfgcleanup.c (revision 212442) +++ tree-cfgcleanup.c (working copy) @@ -162,6 +162,23 @@ cleanup_control_expr_graph (basic_block return retval; } +/* Cleanup the GF_CALL_CTRL_ALTERING flag according to + to updated gimple_call_flags. */ + +static void +cleanup_call_ctrl_altering_flag (gimple bb_end) +{ + if (!is_gimple_call (bb_end) + || !gimple_call_ctrl_altering_p (bb_end)) +return; + + int flags = gimple_call_flags (bb_end); + if (((flags & (ECF_CONST | ECF_PURE)) + && !(flags & ECF_LOOPING_CONST_OR_PURE)) + || (flags & ECF_LEAF)) +gimple_call_set_ctrl_altering (bb_end, false); +} + /* Try to remove superfluous control structures in basic block BB. Returns true if anything changes. */ @@ -182,6 +199,9 @@ cleanup_control_flow_bb (basic_block bb) stmt = gsi_stmt (gsi); + /* Try to cleanup ctrl altering flag for call which ends bb. */ + cleanup_call_ctrl_altering_flag (stmt); + if (gimple_code (stmt) == GIMPLE_COND || gimple_code (stmt) == GIMPLE_SWITCH) retval |= cleanup_control_expr_graph (bb, gsi); @@ -594,30 +614,24 @@ fixup_noreturn_call (gimple stmt) known not to return, and remove the unreachable code. */ static bool -split_bbs_on_noreturn_calls (void) +split_bb_on_noreturn_calls (basic_block bb) { bool changed = false; - gimple stmt; - basic_block bb; + gimple_stmt_iterator gsi; - /* Detect cases where a mid-block call is now known not to return. */ - if (cfun->gimple_df) -while (vec_safe_leng
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
To avoid the unused new discriminator value, I added a map "found_call_this_line" to track whether a call is the first call in a source line seen when assigning discriminators. For the first call in a source line, its discriminator is 0. For the following calls in the same source line, a new discriminator will be used everytime. The new patch is attached. Internal perf test and regression test are ok. Is it ok for google-4_9? Thanks, Wei. On Thu, Aug 7, 2014 at 2:10 PM, Wei Mi wrote: > Yes, that is intentional. It is to avoid assiging a discriminator for > the first call in the group of calls with the same source lineno. > Starting from the second call in the group, it will get a different > discriminator with previous call in the same group. > > Thanks, > Wei. > > On Thu, Aug 7, 2014 at 12:17 PM, Cary Coutant wrote: >>> static int >>> -next_discriminator_for_locus (location_t locus) >>> +increase_discriminator_for_locus (location_t locus, bool return_next) >>> { >>>struct locus_discrim_map item; >>>struct locus_discrim_map **slot; >>> @@ -934,8 +936,10 @@ next_discriminator_for_locus (location_t >>>(*slot)->locus = locus; >>>(*slot)->discriminator = 0; >>> } >>> + >>>(*slot)->discriminator++; >>> - return (*slot)->discriminator; >>> + return return_next ? (*slot)->discriminator >>> +: (*slot)->discriminator - 1; >>> } >> >> Won't this have the effect of sometimes incrementing the next >> available discriminator without actually using the new value? That is, >> if you call it once with return_next == false, and then with >> return_next == true. >> >> -cary ChangeLog: 2014-08-24 Wei Mi * tree-cfg.c (assign_discriminators): Assign different discriminators for calls belonging to the same source line. Index: tree-cfg.c === --- tree-cfg.c (revision 213402) +++ tree-cfg.c (working copy) @@ -992,7 +992,13 @@ static void assign_discriminators (void) { basic_block bb; + /* If there is a location saved in the hash_table, it means that we + already found a call in the source line before. For the calls which + are not the first call found in the same source line, we don't assign + new discriminator for it, so that .debug_line section will be smaller. */ + hash_table found_call_this_line; + found_call_this_line.create (13); FOR_EACH_BB_FN (bb, cfun) { edge e; @@ -1009,23 +1015,31 @@ assign_discriminators (void) for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple stmt = gsi_stmt (gsi); - if (curr_locus == UNKNOWN_LOCATION) - { - curr_locus = gimple_location (stmt); - } - else if (!same_line_p (curr_locus, gimple_location (stmt))) + if (gimple_code (stmt) == GIMPLE_CALL) { + struct locus_discrim_map item; + struct locus_discrim_map **slot; + curr_locus = gimple_location (stmt); - curr_discr = 0; - } - else if (curr_discr != 0) - { - gimple_set_location (stmt, location_with_discriminator ( - gimple_location (stmt), curr_discr)); + item.locus = curr_locus; + item.discriminator = 0; + slot = found_call_this_line.find_slot (&item, INSERT); + /* If the current call is not the first call seen in curr_locus, +assign the next discriminator to it, else keep its discriminator +unchanged. */ + if (*slot != HTAB_EMPTY_ENTRY) + { + curr_discr = next_discriminator_for_locus (curr_locus); + gimple_set_location (stmt, location_with_discriminator ( + gimple_location (stmt), curr_discr)); + } + else + { + *slot = XNEW (struct locus_discrim_map); + (*slot)->locus = curr_locus; + (*slot)->discriminator = 0; + } } - /* Allocate a new discriminator for CALL stmt. */ - if (gimple_code (stmt) == GIMPLE_CALL) - curr_discr = next_discriminator_for_locus (curr_locus); } if (locus == UNKNOWN_LOCATION) @@ -1047,6 +1061,7 @@ assign_discriminators (void) } } } + found_call_this_line.dispose (); } /* Create the edges for a GIMPLE_COND starting at block BB. */
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
Hi Cary, Is the new patch ok for google-4_9? Thanks, Wei. On Sun, Aug 24, 2014 at 8:53 PM, Wei Mi wrote: > To avoid the unused new discriminator value, I added a map > "found_call_this_line" to track whether a call is the first call in a > source line seen when assigning discriminators. For the first call in > a source line, its discriminator is 0. For the following calls in the > same source line, a new discriminator will be used everytime. The new > patch is attached. Internal perf test and regression test are ok. Is > it ok for google-4_9? > > Thanks, > Wei. > > > > On Thu, Aug 7, 2014 at 2:10 PM, Wei Mi wrote: >> Yes, that is intentional. It is to avoid assiging a discriminator for >> the first call in the group of calls with the same source lineno. >> Starting from the second call in the group, it will get a different >> discriminator with previous call in the same group. >> >> Thanks, >> Wei. >> >> On Thu, Aug 7, 2014 at 12:17 PM, Cary Coutant wrote: >>>> static int >>>> -next_discriminator_for_locus (location_t locus) >>>> +increase_discriminator_for_locus (location_t locus, bool return_next) >>>> { >>>>struct locus_discrim_map item; >>>>struct locus_discrim_map **slot; >>>> @@ -934,8 +936,10 @@ next_discriminator_for_locus (location_t >>>>(*slot)->locus = locus; >>>>(*slot)->discriminator = 0; >>>> } >>>> + >>>>(*slot)->discriminator++; >>>> - return (*slot)->discriminator; >>>> + return return_next ? (*slot)->discriminator >>>> +: (*slot)->discriminator - 1; >>>> } >>> >>> Won't this have the effect of sometimes incrementing the next >>> available discriminator without actually using the new value? That is, >>> if you call it once with return_next == false, and then with >>> return_next == true. >>> >>> -cary
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
Thanks, that is ellegant. Will paste a new patch in this way soon. Wei. On Fri, Aug 29, 2014 at 10:11 AM, Cary Coutant wrote: >> To avoid the unused new discriminator value, I added a map >> "found_call_this_line" to track whether a call is the first call in a >> source line seen when assigning discriminators. For the first call in >> a source line, its discriminator is 0. For the following calls in the >> same source line, a new discriminator will be used everytime. The new >> patch is attached. Internal perf test and regression test are ok. Is >> it ok for google-4_9? > > This seems overly complex to me. I'd think all you need to do is add a > bit to locus_discrim_map (stealing a bit from discriminator ought to > be fine) that indicates whether the next call should increment the > discriminator or not. Something like this: > > increase_discriminator_for_locus (location_t locus, bool return_next) > { > ... > if (return_next || (*slot)->needs_increment) > { > (*slot)->discriminator++; > (*slot)->needs_increment = false; > } > else > (*slot)->needs_increment = true; > return (*slot)->discriminator; > } > > -cary
Re: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno
> On Fri, Aug 29, 2014 at 10:11 AM, Cary Coutant wrote: >>> To avoid the unused new discriminator value, I added a map >>> "found_call_this_line" to track whether a call is the first call in a >>> source line seen when assigning discriminators. For the first call in >>> a source line, its discriminator is 0. For the following calls in the >>> same source line, a new discriminator will be used everytime. The new >>> patch is attached. Internal perf test and regression test are ok. Is >>> it ok for google-4_9? >> >> This seems overly complex to me. I'd think all you need to do is add a >> bit to locus_discrim_map (stealing a bit from discriminator ought to >> be fine) that indicates whether the next call should increment the >> discriminator or not. Something like this: >> >> increase_discriminator_for_locus (location_t locus, bool return_next) >> { >> ... >> if (return_next || (*slot)->needs_increment) >> { >> (*slot)->discriminator++; >> (*slot)->needs_increment = false; >> } >> else >> (*slot)->needs_increment = true; >> return (*slot)->discriminator; >> } >> >> -cary Here is the new patch (attached). Regression test passes. Cary, is it ok? Thanks, Wei. ChangeLog: 2014-08-29 Wei Mi * tree-cfg.c (struct locus_discrim_map): New field needs_increment. (next_discriminator_for_locus): Increase discriminator only when return_next or needs_increment are true. (assign_discriminator): Add an actual for next_discriminator_for_locus. (assign_discriminators): Assign different discriminators for calls belonging to the same source line. Index: tree-cfg.c === --- tree-cfg.c (revision 213402) +++ tree-cfg.c (working copy) @@ -112,7 +112,14 @@ static struct cfg_stats_d cfg_stats; struct locus_discrim_map { location_t locus; - int discriminator; + /* Different calls belonging to the same source line will be assigned + different discriminators. But we want to keep the discriminator of + the first call in the same source line to be 0, in order to reduce + the .debug_line section size. needs_increment is used for this + purpose. It is initialized as false and will be set to true after + the first call is seen. */ + bool needs_increment:1; + int discriminator:31; }; /* Hashtable helpers. */ @@ -914,10 +921,15 @@ make_edges (void) /* Find the next available discriminator value for LOCUS. The discriminator distinguishes among several basic blocks that share a common locus, allowing for more accurate sample-based - profiling. */ + profiling. If RETURN_NEXT is true, return the next discriminator + anyway. If RETURN_NEXT is not true, we may not increase the + discriminator if locus_discrim_map::needs_increment is false, + which is used when the stmt is the first call stmt in current + source line. locus_discrim_map::needs_increment will be set to + true after the first call is seen. */ static int -next_discriminator_for_locus (location_t locus) +next_discriminator_for_locus (location_t locus, bool return_next) { struct locus_discrim_map item; struct locus_discrim_map **slot; @@ -932,9 +944,13 @@ next_discriminator_for_locus (location_t *slot = XNEW (struct locus_discrim_map); gcc_assert (*slot); (*slot)->locus = locus; + (*slot)->needs_increment = false; (*slot)->discriminator = 0; } - (*slot)->discriminator++; + if (return_next || (*slot)->needs_increment) +(*slot)->discriminator++; + else +(*slot)->needs_increment = true; return (*slot)->discriminator; } @@ -974,7 +990,7 @@ assign_discriminator (location_t locus, if (locus == UNKNOWN_LOCATION) return; - discriminator = next_discriminator_for_locus (locus); + discriminator = next_discriminator_for_locus (locus, true); for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { @@ -1009,23 +1025,13 @@ assign_discriminators (void) for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple stmt = gsi_stmt (gsi); - if (curr_locus == UNKNOWN_LOCATION) - { + if (gimple_code (stmt) == GIMPLE_CALL) +{ curr_locus = gimple_location (stmt); - } - else if (!same_line_p (curr_locus, gimple_location (stmt))) - { - curr_locus = gimple_location (stmt); - curr_discr = 0; - } - else if (curr_discr != 0) - { + curr_discr = next_discriminator_for_locus (curr_locus, false); gimple_set_location (stmt, location_with_discriminator
Re: [PATCH] PR58985: testcase error.
+Release manager. Thanks, committed to trunk as r204438. Ok for 4.8 branch? On Tue, Nov 5, 2013 at 11:19 AM, Jeff Law wrote: > On 11/04/13 12:07, Wei Mi wrote: >> >> Hi, >> >> This is to fix testcase error reported in PR58985. >> >> The intention of the testcase was to ensure there was no REG_EQUIV >> notes generated for a reg which was used in a paradoxical subreg. When >> target was x86, there was subreg generated so I omitted to add the >> subreg in the regexp pattern. However there is no subreg generated for >> target cris-axis-elf, so REG_EQUIV should be allowed. >> >> Is it ok for trunk and gcc-4.8 branch? >> >> Thanks, >> Wei Mi. >> >> 2013-11-04 Wei Mi >> >> PR regression/58985 >> * testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern. > > Fine for the trunk. Release manager's call for the branch. > > jeff >
[PATCH] Builtins handling in IVOPT
Hi, This patch works on the intrinsic calls handling issue in IVOPT mentioned here: http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html In find_interesting_uses_stmt, it changes arg = expr __builtin_xxx (arg) to arg = expr; tmp = addr_expr (mem_ref(arg)); __builtin_xxx (tmp, ...) So mem_ref(arg) could be handled by find_interesting_uses_address, and an iv use of USE_ADDRESS type will be generated for expr, then a TMR will be generated for expr in rewrite_use_address. Expand pass is changed accordingly to keep the complex addressing mode not to be splitted for cse. With the patch we can handle the motivational case below. #include extern __m128i arr[], d[]; void test (void) { unsigned int b; for (b = 0; b < 1000; b += 2) { __m128i *p = (__m128i *)(&d[b]); __m128i a = _mm_load_si128(&arr[4*b+3]); __m128i v = _mm_loadu_si128(p); v = _mm_xor_si128(v, a); _mm_storeu_si128(p, v); } } gcc-r204792 Without the patch: .L2: movdqu (%rax), %xmm0 subq$-128, %rdx addq$32, %rax pxor-128(%rdx), %xmm0 movups %xmm0, -32(%rax) cmpq$arr+64048, %rdx jne .L2 gcc-r204792 With the patch: .L2: movdqu d(%rax), %xmm0 pxorarr+48(,%rax,4), %xmm0 addq$32, %rax movups %xmm0, d-32(%rax) cmpq$16000, %rax jne .L2 Following things to be addressed later: 1. TER needs to be extended to handle the case when TMR is csed. 2. For more complex cases to work, besides this patch, we also need to tune the AVG_LOOP_NITER, which is now set to 5, and it limits induction variables merging a lot. Increasing the parameter to a larger one could remove some induction variable in critical loop in some our benchmarks. reg pressure estimation may also need to be tuned. I will address it in a separate patch. 3. Now the information about which param of a builtin is of memory reference type is simply listed as a switch-case in builtin_has_mem_ref_p and ix86_builtin_has_mem_ref_p. This is not ideal but there is no infrastructure to describe it in existing implementation. More detailed information such as parameter and call side-effect will be important for more precise alias and may worth some work. Maybe the refinement about this patch could be done after that. regression and bootstrap pass on x86_64-linux-gnu. ok for trunk? Thanks, Wei. 2013-11-20 Wei Mi * expr.c (expand_expr_addr_expr_1): Not to split TMR. (expand_expr_real_1): Ditto. * targhooks.c (default_builtin_has_mem_ref_p): Default builtin. * tree-ssa-loop-ivopts.c (struct iv): Add field builtin_mem_param. (alloc_iv): Ditto. (remove_unused_ivs): Ditto. (builtin_has_mem_ref_p): New function. (find_interesting_uses_stmt): Special handling of builtins. * gimple-expr.c (is_gimple_address): Add handling of TMR. * gimple-expr.h (is_gimple_addressable): Ditto. * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. (ix86_atomic_assign_expand_fenv): Ditto. (ix86_expand_special_args_builtin): Special handling of TMR for builtin. * target.def (builtin_has_mem_ref_p): New hook. * doc/tm.texi.in: Ditto. * doc/tm.texi: Generated. 2013-11-20 Wei Mi * gcc.dg/tree-ssa/ivopt_5.c: New test. Index: expr.c === --- expr.c (revision 204792) +++ expr.c (working copy) @@ -7467,7 +7467,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t tem = fold_build_pointer_plus (tem, TREE_OPERAND (exp, 1)); return expand_expr (tem, target, tmode, modifier); } +case TARGET_MEM_REF: + { + int old_cse_not_expected; + addr_space_t as + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0; + result = addr_for_mem_ref (exp, as, true); + old_cse_not_expected = cse_not_expected; + cse_not_expected = true; + result = memory_address_addr_space (tmode, result, as); + cse_not_expected = old_cse_not_expected; +return result; + } case CONST_DECL: /* Expand the initializer like constants above. */ result = XEXP (expand_expr_constant (DECL_INITIAL (exp), @@ -9526,9 +9538,13 @@ expand_expr_real_1 (tree exp, rtx target = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0; enum insn_code icode; unsigned int align; + int old_cse_not_expected; op0 = addr_for_mem_ref (exp, as, true); + old_cse_not_expected = cse_not_expected; + cse_not_expected = true; op0 = memory_address_addr_space (mode, op0, as); + cse_not_expected = old_cse_not_expected; temp = gen_rtx_MEM (mode, op0); set_mem_attributes (temp, exp, 0); set_mem_addr_space (temp, as); Index:
Re: [PATCH] Builtins handling in IVOPT
> So what you are doing is basically not only rewriting memory references > to possibly use TARGET_MEM_REF but also address uses to use > &TARGET_MEM_REF. I think this is a good thing in general > (given instructions like x86 lea) and I would not bother distinguishing > the different kind of uses. > > Richard. > You mean to change normal expr to &TMR(expr) form in order to utilize x86 lea type instructions as much as possible. It is interesting. I can experiment that idea later. I am not sure if it could simply work. My concern is x86 lea still has some limitation (such as three operands lea will have longer latency and can only be issued to port1), if we change some expr to &TMR(expr), will it inhitbit cse opportunity if codegen find out it is not good to use lea? Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
On Thu, Nov 21, 2013 at 1:01 PM, Richard Biener wrote: > Wei Mi wrote: >>On Thu, Nov 21, 2013 at 11:36 AM, Richard Biener >> wrote: >>> Wei Mi wrote: >>>>> So what you are doing is basically not only rewriting memory >>>>references >>>>> to possibly use TARGET_MEM_REF but also address uses to use >>>>> &TARGET_MEM_REF. I think this is a good thing in general >>>>> (given instructions like x86 lea) and I would not bother >>>>distinguishing >>>>> the different kind of uses. >>>>> >>>>> Richard. >>>>> >>>> >>>>You mean to change normal expr to &TMR(expr) form in order to utilize >>>>x86 lea type instructions as much as possible. It is interesting. I >>>>can experiment that idea later. I am not sure if it could simply >>work. >>>>My concern is x86 lea still has some limitation (such as three >>>>operands lea will have longer latency and can only be issued to >>>>port1), if we change some expr to &TMR(expr), will it inhitbit cse >>>>opportunity if codegen find out it is not good to use lea? >>> >>> That needs to be determined. Over all it might be because ivopts >>runs so early. At rtl level there should not be big differences apart >>from better initial address computations. >>> >>> Did I misunderstand what your patch does? >>> >>> Richard. >>> >> >>My patch wants to address the issue that iv uses using as memory >>reference actuals for load/store/prefetch builtins are treated as >>non-linear iv uses instead of address iv uses, and the result of >>determine_use_iv_cost is wrong. After we change those uses to address >>uses, less ivs may be used, TMR will be generated for those iv uses >>and efficent addressing mode could be utilized. > > But are not all pointer typed uses address uses?! > > Richard. > If a pointer typed use is plainly value passed to a func call, it is not an address use, right? But as you said, x86 lea may help here. Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
On Thu, Nov 21, 2013 at 11:36 AM, Richard Biener wrote: > Wei Mi wrote: >>> So what you are doing is basically not only rewriting memory >>references >>> to possibly use TARGET_MEM_REF but also address uses to use >>> &TARGET_MEM_REF. I think this is a good thing in general >>> (given instructions like x86 lea) and I would not bother >>distinguishing >>> the different kind of uses. >>> >>> Richard. >>> >> >>You mean to change normal expr to &TMR(expr) form in order to utilize >>x86 lea type instructions as much as possible. It is interesting. I >>can experiment that idea later. I am not sure if it could simply work. >>My concern is x86 lea still has some limitation (such as three >>operands lea will have longer latency and can only be issued to >>port1), if we change some expr to &TMR(expr), will it inhitbit cse >>opportunity if codegen find out it is not good to use lea? > > That needs to be determined. Over all it might be because ivopts runs so > early. At rtl level there should not be big differences apart from better > initial address computations. > > Did I misunderstand what your patch does? > > Richard. > My patch wants to address the issue that iv uses using as memory reference actuals for load/store/prefetch builtins are treated as non-linear iv uses instead of address iv uses, and the result of determine_use_iv_cost is wrong. After we change those uses to address uses, less ivs may be used, TMR will be generated for those iv uses and efficent addressing mode could be utilized. Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak wrote: > Hi, > >> This patch works on the intrinsic calls handling issue in IVOPT mentioned >> here: >> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >> >> In find_interesting_uses_stmt, it changes >> >> arg = expr >> __builtin_xxx (arg) >> >> to >> >> arg = expr; >> tmp = addr_expr (mem_ref(arg)); >> __builtin_xxx (tmp, ...) > > this looks a bit confusing (and wasteful) to me. It would make more sense to > just record the argument as USE_ADDRESS and do the rewriting in > rewrite_use_address. > > Zdenek My intention was to use find_interesting_uses_address directly. But you are right, the logic looks better to only do the rewriting in rewrite_use_address. I will change here. Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
Thanks for the comments. Regards, Wei. On Thu, Nov 21, 2013 at 12:48 AM, Bin.Cheng wrote: > I don't know very much about the problem but willing to study since I > am looking into IVO recently :) > >> --- tree-ssa-loop-ivopts.c (revision 204792) >> +++ tree-ssa-loop-ivopts.c (working copy) >> @@ -135,6 +135,8 @@ struct iv >>tree ssa_name; /* The ssa name with the value. */ >>bool biv_p; /* Is it a biv? */ >>bool have_use_for; /* Do we already have a use for it? */ >> + bool builtin_mem_param; /* Used as param of a builtin, so it could not be >> +removed by remove_unused_ivs. */ > > As comment below, address parameter may be not limited to builtin > function only, how about a variable name more generic? Ok, I will change it to a different name. > >>unsigned use_id; /* The identifier in the use if it is the case. */ >> }; >> >> @@ -952,6 +954,7 @@ alloc_iv (tree base, tree step) >>iv->step = step; >>iv->biv_p = false; >>iv->have_use_for = false; >> + iv->builtin_mem_param = false; >>iv->use_id = 0; >>iv->ssa_name = NULL_TREE; >> >> @@ -1874,13 +1877,36 @@ find_invariants_stmt (struct ivopts_data >> } >> } >> >> +/* Find whether the Ith param of the BUILTIN is a mem >> + reference. If I is -1, it returns whether the BUILTIN >> + contains any mem reference type param. */ >> + >> +static bool >> +builtin_has_mem_ref_p (gimple builtin, int i) >> +{ >> + tree fndecl = gimple_call_fndecl (builtin); >> + if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) >> +{ >> + switch (DECL_FUNCTION_CODE (fndecl)) >> + { >> + case BUILT_IN_PREFETCH: >> + if (i == -1 || i == 0) >> +return true; >> + } > This switch looks strange, could be refactored I think. > I am not sure if there will be more builtins coming in the func, so I use switch case here. >> +} >> + else if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD) >> +return targetm.builtin_has_mem_ref_p ((int) DECL_FUNCTION_CODE >> (fndecl), i); >> + >> + return false; >> +} >> + >> /* Finds interesting uses of induction variables in the statement STMT. */ >> >> static void >> find_interesting_uses_stmt (struct ivopts_data *data, gimple stmt) >> { >>struct iv *iv; >> - tree op, *lhs, *rhs; >> + tree op, *lhs, *rhs, callee; >>ssa_op_iter iter; >>use_operand_p use_p; >>enum tree_code code; >> @@ -1937,6 +1963,74 @@ find_interesting_uses_stmt (struct ivopt >> >> call (memory). */ >> } >> + else if (is_gimple_call (stmt) >> + && (callee = gimple_call_fndecl (stmt)) >> + && is_builtin_fn (callee) >> + && builtin_has_mem_ref_p (stmt, -1)) >> +{ > > I noticed the preceding comments about call(memory), is your change a > specific case of the mention one? > Yes, I will fix it. >> + size_t i; >> + for (i = 0; i < gimple_call_num_args (stmt); i++) >> + { >> + if (builtin_has_mem_ref_p (stmt, i)) >> + { >> + gimple def, g; >> + gimple_seq seq = NULL; >> + tree type, mem, addr, rhs; >> + tree *arg = gimple_call_arg_ptr (stmt, i); >> + location_t loc = gimple_location (stmt); >> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); >> + >> + if (TREE_CODE (*arg) != SSA_NAME) >> + continue; >> + >> + def = SSA_NAME_DEF_STMT (*arg); >> + gcc_assert (gimple_code (def) == GIMPLE_PHI >> + || is_gimple_assign (def)); >> + /* Suppose we have the case: >> + arg = expr; >> + call (arg) >> +If the expr is not like the form: &MEM(...), change it to: >> + arg = expr; >> + tmp = &MEM(arg); >> + call(tmp); >> +then try to find interesting uses address in MEM(arg). */ >> + if (is_gimple_assign (def) >> + && (rhs = gimple_assign_rhs1(def)) >> + && TREE_CODE (rhs) == ADDR_EXPR >> + && REFERENCE_CLASS_P (TREE_OPERAND (rhs, 0))) >> + { >> + iv = get_iv (data, *arg); >> + if (iv && !iv->builtin_mem_param) >> + iv->builtin_mem_param = true; >> + >> + find_interesting_uses_address (data, def, >> +&TREE_OPERAND (rhs, 0)); >> + } >> + else >> + { >> + mem = build2 (MEM_REF, TREE_TYPE (*arg), *arg, >> + build_int_cst (TREE_TYPE (*arg), 0)); >> + type = build_pointer_type (TREE_TYPE (*arg)); >> + addr = build1 (ADDR_EXPR, type, mem); >> + g = gimple_build_assign_with_ops (ADDR_EXPR, >> + make_ssa_name
Re: [PATCH] Builtins handling in IVOPT
On Fri, Nov 22, 2013 at 6:11 AM, Zdenek Dvorak wrote: > Hi, > >> >> > If a pointer typed use is plainly value passed to a func call, it is >> >> > not an address use, right? But as you said, x86 lea may help here. >> >> >> >> But that's what you are matching ... (well, for builtins you know >> >> will expand that to a memory reference). >> >> >> >> What I dislike in the patch is the special-casing of some builtins >> >> via a target hook. I'd rather say treat all internal functions and >> >> all target builtins that way. Or simply all addresses. >> > >> > unless the architecture has lea-type instruction to compute the address, >> > computing say b+4*i incurs some cost, while if mem[b+4*i] is accessed, the >> > computation is for free. Thus, it does not make sense to treat the address >> > computations the same way as memory references (or to treat all functions >> > the same way as builtins which translate to memory references), >> >> I understand that, but I think the patch tries to improve code generation >> not by changing the set of IVs used (thus adjust cost considerations) >> but by changing the way it rewrites certain address uses. > > actually, the costs are adjusted in the patch -- the address calculations > for the handled builtins are recorded as USE_ADDRESS (not as > USE_NONLINEAR_EXPR), > so that their costs are calculated in the same way as for memory references, > > Zdenek Sorry for not making it clear. Yes, the costs are adjusted in the patch. Although I changed the expr to addr_expr(mem(expr)) pattern, the intention was only to avoid changing find_interesting_uses_address and rewrite_use_address, which Zdenek and Bin.Cheng thought was not good. I am changing this part. Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
> I think the problem can be showed by below example: > struct tag > { > int a[10]; > int b; > }; > struct tag s; > int foo(int len) > { > int i = 0; > int sum = 0; > for (i = 0; i < len; i++) > sum += barr (&s.a[i]); > > return sum; > } > The dump before IVOPT is like: > > : > # i_16 = PHI > # sum_17 = PHI > _6 = &s.a[i_16]; > _8 = barr (_6); > sum_9 = _8 + sum_17; > i_10 = i_16 + 1; > if (len_5(D) > i_10) > goto ; > else > goto ; > > : > # sum_11 = PHI > goto ; > > : > goto ; > > The iv use of _6 in bar(_6) is actually an memory address and it can > be computed efficiently for some targets. For now IVOPT only honors > address type iv uses appeared in memory access, so this patch is > trying to handle this kind of address iv which is outside of memory > access just the same. Please correct me if I am wrong. Yes, that is correct. > > After thought twice, I have two concerns about this: > 1) Why can't we just enhance the nolinear use logic to handle this > kind of iv use? It's more complicated to handle it as a normal > address type iv, consider that IVOPT adds auto-increment candidates > according to them, you have to deal with that in this way > (auto-increment addressing mode can't apply to this kind of address iv > use). I think existing address iv use logic is enough to handle it. I am changing it and moving the gimple change from find_interesting_uses_stmt to rewrite_use_address in original patch. > 2) If I understand it right, this is an issue not only limited to > builtin functions, it stands for normal function calls too, right? > For builtin function, such as _mm_loadu_si128(b+4*i), it will be expanded to an insn: MOVDQU mem[b+4*i], $xmmreg, and the computation of b+4*i is for free. But for function call, the b+4*i will only be used as the value of an actual, and its computation cost cannot be avoided. Thanks, Wei.
Re: [PATCH] Builtins handling in IVOPT
On Fri, Nov 22, 2013 at 9:21 AM, Wei Mi wrote: >> I think the problem can be showed by below example: >> struct tag >> { >> int a[10]; >> int b; >> }; >> struct tag s; >> int foo(int len) >> { >> int i = 0; >> int sum = 0; >> for (i = 0; i < len; i++) >> sum += barr (&s.a[i]); >> >> return sum; >> } >> The dump before IVOPT is like: >> >> : >> # i_16 = PHI >> # sum_17 = PHI >> _6 = &s.a[i_16]; >> _8 = barr (_6); >> sum_9 = _8 + sum_17; >> i_10 = i_16 + 1; >> if (len_5(D) > i_10) >> goto ; >> else >> goto ; >> >> : >> # sum_11 = PHI >> goto ; >> >> : >> goto ; >> >> The iv use of _6 in bar(_6) is actually an memory address and it can >> be computed efficiently for some targets. For now IVOPT only honors >> address type iv uses appeared in memory access, so this patch is >> trying to handle this kind of address iv which is outside of memory >> access just the same. Please correct me if I am wrong. > > Yes, that is correct. > Sorry, to make a correction here. That is not my patch is doing. The patch is not handling normal address exprs, but those exprs could be expressed as mem accesses after builtins expand. >> >> After thought twice, I have two concerns about this: >> 1) Why can't we just enhance the nolinear use logic to handle this >> kind of iv use? It's more complicated to handle it as a normal >> address type iv, consider that IVOPT adds auto-increment candidates >> according to them, you have to deal with that in this way >> (auto-increment addressing mode can't apply to this kind of address iv >> use). > > I think existing address iv use logic is enough to handle it. I am > changing it and moving the gimple change from > find_interesting_uses_stmt to rewrite_use_address in original patch. > >> 2) If I understand it right, this is an issue not only limited to >> builtin functions, it stands for normal function calls too, right? >> > > For builtin function, such as _mm_loadu_si128(b+4*i), it will be > expanded to an insn: MOVDQU mem[b+4*i], $xmmreg, and the computation > of b+4*i is for free. But for function call, the b+4*i will only be > used as the value of an actual, and its computation cost cannot be > avoided. > > Thanks, > Wei.
Re: [PATCH] Builtins handling in IVOPT
On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak wrote: > Hi, > >> This patch works on the intrinsic calls handling issue in IVOPT mentioned >> here: >> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >> >> In find_interesting_uses_stmt, it changes >> >> arg = expr >> __builtin_xxx (arg) >> >> to >> >> arg = expr; >> tmp = addr_expr (mem_ref(arg)); >> __builtin_xxx (tmp, ...) > > this looks a bit confusing (and wasteful) to me. It would make more sense to > just record the argument as USE_ADDRESS and do the rewriting in > rewrite_use_address. > > Zdenek I updated the patch. The gimple changing part is now moved to rewrite_use_address. Add support for plain address expr in addition to reference expr in find_interesting_uses_address. bootstrap and testing is going on. 2013-11-22 Wei Mi * expr.c (expand_expr_addr_expr_1): Not to split TMR. (expand_expr_real_1): Ditto. * targhooks.c (default_builtin_has_mem_ref_p): Default builtin. * tree-ssa-loop-ivopts.c (builtin_has_mem_ref_p): New function. (rewrite_use_address): Add TMR for builtin. (find_interesting_uses_stmt): Special handling of builtins. * gimple-expr.c (is_gimple_address): Add handling of TMR. * gimple-expr.h (is_gimple_addressable): Ditto. * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. (ix86_atomic_assign_expand_fenv): Ditto. (ix86_expand_special_args_builtin): Special handling of TMR for builtin. * target.def (builtin_has_mem_ref_p): New hook. * doc/tm.texi.in: Ditto. * doc/tm.texi: Generated. 2013-11-22 Wei Mi * gcc.dg/tree-ssa/ivopt_5.c: New test. Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c === --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) @@ -0,0 +1,21 @@ +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */ + +/* Make sure only one iv is selected after IVOPT. */ + +#include +extern __m128i arr[], d[]; +void test (void) +{ +unsigned int b; +for (b = 0; b < 1000; b += 2) { + __m128i *p = (__m128i *)(&d[b]); + __m128i a = _mm_load_si128(&arr[4*b+3]); + __m128i v = _mm_loadu_si128(p); + v = _mm_xor_si128(v, a); + _mm_storeu_si128(p, v); +} +} + +/* { dg-final { scan-tree-dump-times "PHI stmt); tree base_hint = NULL_TREE; - tree ref, iv; + tree ref, iv, callee; bool ok; adjust_iv_update_pos (cand, use); @@ -6383,10 +6437,41 @@ rewrite_use_address (struct ivopts_data base_hint = var_at_stmt (data->current_loop, cand, use->stmt); iv = var_at_stmt (data->current_loop, cand, use->stmt); - ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, - reference_alias_ptr_type (*use->op_p), - iv, base_hint, data->speed); - copy_ref_info (ref, *use->op_p); + + /* For builtin_call(addr_expr), change it to: + tmp = ADDR_EXPR(TMR(...)); + call (tmp); */ + if (is_gimple_call (use->stmt) + && (callee = gimple_call_fndecl (use->stmt)) + && is_builtin_fn (callee)) +{ + gimple g; + gimple_seq seq = NULL; + tree addr; + tree type = TREE_TYPE (TREE_TYPE (*use->op_p)); + location_t loc = gimple_location (use->stmt); + gimple_stmt_iterator gsi = gsi_for_stmt (use->stmt); + + ref = create_mem_ref (&bsi, type, &aff, + TREE_TYPE (*use->op_p), + iv, base_hint, data->speed); + addr = build1 (ADDR_EXPR, TREE_TYPE (*use->op_p), ref); + g = gimple_build_assign_with_ops (ADDR_EXPR, + make_ssa_name (TREE_TYPE (*use->op_p), NULL), + addr, NULL); + gimple_set_location (g, loc); + gimple_seq_add_stmt_without_update (&seq, g); + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + + ref = gimple_assign_lhs (g); +} + else +{ + ref = create_mem_ref (&bsi, TREE_TYPE (*use->op_p), &aff, + reference_alias_ptr_type (*use->op_p), + iv, base_hint, data->speed); + copy_ref_info (ref, *use->op_p); +} *use->op_p = ref; } Index: config/i386/i386.c === --- config/i386/i386.c (revision 204792) +++ config/i386/i386.c (working copy) @@ -29639,6 +29639,50 @@ ix86_init_mmx_sse_builtins (void) } } +/* Return whether the Ith param of the BUILTIN_FUNCTION + is a memory reference. If I == -1, return whether th
Re: [PATCH] Builtins handling in IVOPT
bootstrap and regression of the updated patch pass. On Sat, Nov 23, 2013 at 12:05 AM, Wei Mi wrote: > On Thu, Nov 21, 2013 at 12:19 AM, Zdenek Dvorak > wrote: >> Hi, >> >>> This patch works on the intrinsic calls handling issue in IVOPT mentioned >>> here: >>> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01295.html >>> >>> In find_interesting_uses_stmt, it changes >>> >>> arg = expr >>> __builtin_xxx (arg) >>> >>> to >>> >>> arg = expr; >>> tmp = addr_expr (mem_ref(arg)); >>> __builtin_xxx (tmp, ...) >> >> this looks a bit confusing (and wasteful) to me. It would make more sense to >> just record the argument as USE_ADDRESS and do the rewriting in >> rewrite_use_address. >> >> Zdenek > > I updated the patch. The gimple changing part is now moved to > rewrite_use_address. Add support for plain address expr in addition to > reference expr in find_interesting_uses_address. > > bootstrap and testing is going on. > > 2013-11-22 Wei Mi > > * expr.c (expand_expr_addr_expr_1): Not to split TMR. > (expand_expr_real_1): Ditto. > * targhooks.c (default_builtin_has_mem_ref_p): Default > builtin. > * tree-ssa-loop-ivopts.c (builtin_has_mem_ref_p): New function. > (rewrite_use_address): Add TMR for builtin. > (find_interesting_uses_stmt): Special handling of builtins. > * gimple-expr.c (is_gimple_address): Add handling of TMR. > * gimple-expr.h (is_gimple_addressable): Ditto. > * config/i386/i386.c (ix86_builtin_has_mem_ref_p): New target hook. > (ix86_atomic_assign_expand_fenv): Ditto. > (ix86_expand_special_args_builtin): Special handling of TMR for > builtin. > * target.def (builtin_has_mem_ref_p): New hook. > * doc/tm.texi.in: Ditto. > * doc/tm.texi: Generated. > > 2013-11-22 Wei Mi > > * gcc.dg/tree-ssa/ivopt_5.c: New test. > > Index: testsuite/gcc.dg/tree-ssa/ivopt_5.c > === > --- testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/ivopt_5.c (revision 0) > @@ -0,0 +1,21 @@ > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ > +/* { dg-options "-O2 -m64 -fdump-tree-ivopts-details" } */ > + > +/* Make sure only one iv is selected after IVOPT. */ > + > +#include > +extern __m128i arr[], d[]; > +void test (void) > +{ > +unsigned int b; > +for (b = 0; b < 1000; b += 2) { > + __m128i *p = (__m128i *)(&d[b]); > + __m128i a = _mm_load_si128(&arr[4*b+3]); > + __m128i v = _mm_loadu_si128(p); > + v = _mm_xor_si128(v, a); > + _mm_storeu_si128(p, v); > +} > +} > + > +/* { dg-final { scan-tree-dump-times "PHI +/* { dg-final { cleanup-tree-dump "ivopts" } } */ > Index: targhooks.c > === > --- targhooks.c (revision 204792) > +++ targhooks.c (working copy) > @@ -566,6 +566,13 @@ default_builtin_reciprocal (unsigned int > } > > bool > +default_builtin_has_mem_ref_p (int built_in_function ATTRIBUTE_UNUSED, > + int i ATTRIBUTE_UNUSED) > +{ > + return false; > +} > + > +bool > hook_bool_CUMULATIVE_ARGS_mode_tree_bool_false ( > cumulative_args_t ca ATTRIBUTE_UNUSED, > enum machine_mode mode ATTRIBUTE_UNUSED, > Index: expr.c > === > --- expr.c (revision 204792) > +++ expr.c (working copy) > @@ -7467,7 +7467,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t > tem = fold_build_pointer_plus (tem, TREE_OPERAND (exp, 1)); > return expand_expr (tem, target, tmode, modifier); >} > +case TARGET_MEM_REF: > + { > + int old_cse_not_expected; > + addr_space_t as > + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (exp, 0; > > + result = addr_for_mem_ref (exp, as, true); > + old_cse_not_expected = cse_not_expected; > + cse_not_expected = true; > + result = memory_address_addr_space (tmode, result, as); > + cse_not_expected = old_cse_not_expected; > + return result; > + } > case CONST_DECL: >/* Expand the initializer like constants above. */ >result = XEXP (expand_expr_constant (DECL_INITIAL (exp), > @@ -9526,9 +9538,13 @@ expand_expr_real_1 (tree exp, rtx target > = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (T
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
Sorry about the problem. For the failed testcase, it was compiled using -fmodulo-sched. modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which means the jump insn should be scheduled with prev insn as a group. When modulo scheduling is finished, the flag of SCHED_GROUP_P is not cleaned up. After that, pass_jump2 phase split the bb and move the prev insn to another bb. Then pass_sched2 see the flag and mistakenly try to bind the jump insn with a code label. I am thinking other cases setting SCHED_GROUP_P should have the same problem because SCHED_GROUP_P is not cleaned up after scheduling is done. The flag may become inconsistent after some optimizations and may cause problem if it is used by later scheduling passes. I don't know why similar problem was never exposed before. The fix is to simply cleanup SCHED_GROUP_P flag in sched_finish. bootstrap is ok. regression test is going on. Is it ok if regression passes? Thanks, Wei. 2013-11-23 Wei Mi PR rtl-optimization/59020 * haifa-sched.c (cleanup_sched_group): New function. (sched_finish): Call cleanup_sched_group to cleanup SCHED_GROUP_P. 2013-11-23 Wei Mi PR rtl-optimization/59020 * testsuite/gcc.dg/pr59020.c (void f): Index: haifa-sched.c === --- haifa-sched.c (revision 204923) +++ haifa-sched.c (working copy) @@ -6598,6 +6598,23 @@ group_insns_for_macro_fusion () try_group_insn (BB_END (bb)); } +/* Cleanup SCHED_GROUP_P after scheduling is done. This is necessary because + bb may be changed by other optimizations and the flag from last scheduling + may become invalid. If later scheduler see the flag generated from last + scheduling, it may produces incorrect result. */ + +static void +cleanup_sched_group () +{ + basic_block bb; + rtx insn; + + FOR_EACH_BB (bb) +FOR_BB_INSNS(bb, insn) + if (INSN_P (insn) && SCHED_GROUP_P (insn)) + SCHED_GROUP_P (insn) = 0; +} + /* Initialize some global state for the scheduler. This function works with the common data shared between all the schedulers. It is called from the scheduler specific initialization routine. */ @@ -6841,6 +6858,8 @@ sched_finish (void) } free (curr_state); + cleanup_sched_group (); + if (targetm.sched.finish_global) targetm.sched.finish_global (sched_dump, sched_verbose); Index: testsuite/gcc.dg/pr59020.c === --- testsuite/gcc.dg/pr59020.c (revision 0) +++ testsuite/gcc.dg/pr59020.c (revision 0) @@ -0,0 +1,15 @@ +/* PR rtl-optimization/59020 */ + +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmodulo-sched -fno-inline -march=corei7" } */ + +int a, b, d; +unsigned c; + +void f() +{ + unsigned q; + for(; a; a++) +if(((c %= d && 1) ? : 1) & 1) + for(; b; q++); +} On Sat, Nov 23, 2013 at 4:34 PM, H.J. Lu wrote: > On Mon, Nov 4, 2013 at 1:51 PM, Wei Mi wrote: >> Thanks! The three patches are commited as r204367, r204369 and r204371. >> > > r204369 caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59020 > > -- > H.J.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Mon, Nov 25, 2013 at 2:08 AM, Alexander Monakov wrote: > On Sat, 23 Nov 2013, Wei Mi wrote: >> For the failed testcase, it was compiled using -fmodulo-sched. >> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which >> means the jump insn should be scheduled with prev insn as a group. > > SMS doesn't set SCHED_GROUP_P by itself; did you mean that SCHED_GROUP_P is > set by dependency analysis code similar to sched2? > SCHED_GROUP_P is set in sched_analyze for "call + return value" group and other groups, and in sched_init for macrofusion. Both sched_analyze and sched_init are used by SMS (sched_analyze is used by creating ddg). I think sched1 may have the same problem when it set SCHED_GROUP_P and sched2 uses it. >> When modulo scheduling is finished, the flag of SCHED_GROUP_P is not >> cleaned up. After that, pass_jump2 phase split the bb and move the >> prev insn to another bb. Then pass_sched2 see the flag and mistakenly >> try to bind the jump insn with a code label. > > I think the analysis is incomplete. Looking at the backtrace posted in the > bug report, the failure path goes through chain_to_prev_insn, which protects > against such failure: > > prev_nonnote = prev_nonnote_nondebug_insn (insn); > if (BLOCK_FOR_INSN (insn) == BLOCK_FOR_INSN (prev_nonnote) > && ! sched_insns_conditions_mutex_p (insn, prev_nonnote)) > add_dependence (insn, prev_nonnote, REG_DEP_ANTI); > > Why does it end up with a label at the assertion failure point? > > Alexander Because code label is not a note or debug insn. I think it is impossible to detect such inconsistency in chain_to_prev_insn. If the prev_nonnote is not a code label, the bug will not be exposed this time. Suppose some other optimizations insert a real insn before the jump marked as SCHED_GROUP_P, following scheduler pass will schedule them together silently. That is why I think it is necessary to cleanup SCHED_GROUP_P when a scheduling pass is finished. Thanks, Wei.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Mon, Nov 25, 2013 at 10:36 AM, Jeff Law wrote: > On 11/24/13 00:30, Wei Mi wrote: >> >> Sorry about the problem. >> >> For the failed testcase, it was compiled using -fmodulo-sched. >> modulo-sched phase set SCHED_GROUP_P of a jump insn to be true, which >> means the jump insn should be scheduled with prev insn as a group. >> When modulo scheduling is finished, the flag of SCHED_GROUP_P is not >> cleaned up. After that, pass_jump2 phase split the bb and move the >> prev insn to another bb. Then pass_sched2 see the flag and mistakenly >> try to bind the jump insn with a code label. >> >> I am thinking other cases setting SCHED_GROUP_P should have the same >> problem because SCHED_GROUP_P is not cleaned up after scheduling is >> done. The flag may become inconsistent after some optimizations and >> may cause problem if it is used by later scheduling passes. I don't >> know why similar problem was never exposed before. >> >> The fix is to simply cleanup SCHED_GROUP_P flag in sched_finish. > > I think this is showing up because this is the first time we have used > SCHED_GROUP_P in cases where we merely want to keep two instructions > consecutive vs cases where we are required to keep certain instructions > consecutive. For example, all the RTL passes already know they need to keep > a cc0 setter and cc0 user consecutive on a HAVE_cc0 target. > > In the latter case passes should already be doing what is necessary to keep > those instructions consecutive. In the former case, we'd have to audit & > fix passes to honor the desire to keep certain instructions consecutive. > I see. Thanks for showing me the reason. >> >> bootstrap is ok. regression test is going on. Is it ok if regression >> passes? >> >> Thanks, >> Wei. >> >> 2013-11-23 Wei Mi >> >> PR rtl-optimization/59020 >> * haifa-sched.c (cleanup_sched_group): New function. >> (sched_finish): Call cleanup_sched_group to cleanup >> SCHED_GROUP_P. >> >> 2013-11-23 Wei Mi >> PR rtl-optimization/59020 >> * testsuite/gcc.dg/pr59020.c (void f): > > I'll note you're doing an extra pass over all the RTL here. Is there any > clean way you can clean SCHED_GROUP_P without that extra pass over the RTL? > Perhaps when the group actually gets scheduled? > > jeff > With your help to understand that sched group will not be broken by other passes in other cases, I can cleanup SCHED_GROUP_P for macrofusion only by checking every condjump insn which is at the end of BB. Then the cost will be in the same scale with bb nums. Do you think it is ok? Thanks, Wei. 2013-11-25 Wei Mi PR rtl-optimization/59020 * haifa-sched.c (cleanup_sched_group): New function. (sched_finish): Call cleanup_sched_group to cleanup SCHED_GROUP_P. 2013-11-25 Wei Mi PR rtl-optimization/59020 * testsuite/gcc.dg/pr59020.c (void f): Index: haifa-sched.c === --- haifa-sched.c (revision 204923) +++ haifa-sched.c (working copy) @@ -6598,6 +6598,25 @@ group_insns_for_macro_fusion () try_group_insn (BB_END (bb)); } +/* Cleanup SCHED_GROUP_P after scheduling is done. This is necessary because + bb may be changed by other optimizations and the flag from last scheduling + may become invalid. If later scheduler see the flag generated from last + scheduling, it may produces incorrect result. */ + +static void +cleanup_sched_group () +{ + basic_block bb; + rtx insn; + + FOR_EACH_BB (bb) +{ + insn = BB_END (bb); + if (INSN_P (insn) && SCHED_GROUP_P (insn)) + SCHED_GROUP_P (insn) = 0; +} +} + /* Initialize some global state for the scheduler. This function works with the common data shared between all the schedulers. It is called from the scheduler specific initialization routine. */ @@ -6841,6 +6860,8 @@ sched_finish (void) } free (curr_state); + cleanup_sched_group (); + if (targetm.sched.finish_global) targetm.sched.finish_global (sched_dump, sched_verbose); Index: testsuite/gcc.dg/pr59020.c === --- testsuite/gcc.dg/pr59020.c (revision 0) +++ testsuite/gcc.dg/pr59020.c (revision 0) @@ -0,0 +1,15 @@ +/* PR rtl-optimization/59020 */ + +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-options "-O2 -fmodulo-sched -fno-inline -march=corei7" } */ + +int a, b, d; +unsigned c; + +void f() +{ + unsigned q; + for(; a; a++) +if(((c %= d && 1) ? : 1) & 1) + for(; b; q++); +}
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Mon, Nov 25, 2013 at 11:25 AM, Jeff Law wrote: > On 11/25/13 12:16, Wei Mi wrote: >>> >>> >>> I'll note you're doing an extra pass over all the RTL here. Is there >>> any >>> clean way you can clean SCHED_GROUP_P without that extra pass over the >>> RTL? >>> Perhaps when the group actually gets scheduled? >>> >>> jeff >>> >> >> With your help to understand that sched group will not be broken by >> other passes in other cases, I can cleanup SCHED_GROUP_P for >> macrofusion only by checking every condjump insn which is at the end >> of BB. Then the cost will be in the same scale with bb nums. Do you >> think it is ok? >> >> Thanks, >> Wei. >> >> 2013-11-25 Wei Mi >> >> PR rtl-optimization/59020 >> * haifa-sched.c (cleanup_sched_group): New function. >> (sched_finish): Call cleanup_sched_group to cleanup >> SCHED_GROUP_P. >> >> 2013-11-25 Wei Mi >> PR rtl-optimization/59020 >> * testsuite/gcc.dg/pr59020.c (void f): > > But there's nothing that requires the SCHED_GROUP_P to be at the end of a > block. The cc0-setter/cc0-user case was just an example. Another example > would be groups created around call insns on small register class machines. > Doing the cleanup at the end of BB could ensure all the groups inserted for macrofusion will be cleaned. For groups not at the end of a block, no matter whether they are cleaned up or not, nothing will happen because other passes will not mess up those groups -- you said cc0-setter/cc0-user was such a case. Is it call group a different case? > ISTM that when an insn moves from the ready list to back to the main insn > chain, that you can just clear SCHED_GROUP_P at that time. Is that not the > case? > > jeff > For sched1 and sched2, we can do that. Actually, I find it has been done in move_insn when commit_schedule. But for modulo scheduling, I havn't found a good place to do it. Thanks, Wei.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Mon, Nov 25, 2013 at 2:12 PM, Jeff Law wrote: > >> >> Doing the cleanup at the end of BB could ensure all the groups >> inserted for macrofusion will be cleaned. For groups not at the end of >> a block, no matter whether they are cleaned up or not, nothing will >> happen because other passes will not mess up those groups -- you said >> cc0-setter/cc0-user was such a case. Is it call group a different >> case? > > True, it would be safe, but it seems inconsistent and confusing that some > SCHED_GROUP_P references would be purged and others remain. > > Given SCHED_GROUP_P is to used strictly in the scheduler ISTM that we should > be wiping it as we leave and that our RTL checkers ought to be verifying > there are no insns with SCHED_GROUP_P left on. > How about add a verifier TODO_verify_sched_group_flag similar as TODO_verify_rtl_sharing, and add the verifier in the todo lists of all the scheduling passes. > >> >> For sched1 and sched2, we can do that. Actually, I find it has been >> done in move_insn when commit_schedule. But for modulo scheduling, I >> havn't found a good place to do it. > > Well, that's where I'd suggest focusing attention. > > jeff > After looking it carefully, even for sched1 and sched2, it is not ok to depend on move_insn in commit_schedule to clean up all the SCHED_GROUP_P, suppose a block is decided not to be scheduled by dbg_cnt, then SCHED_GROUP_P inside the block will not be cleaned. It is even more difficult to find a place inside SMS scheduling to do the cleanup. Any suggestions? Thanks, Wei.
[PATCH PR64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr
Hi, The patch is to address the bug here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557 It is to call get_addr for VALUE before forming a mem_addr expr with the VALUE and an offset. This is to avoid the problem that get_addr can only handle VALUE but cannot handle an expr like: (VALUE + offset). With the fix, find_base_term can always get the base of the original addr. bootstrap and regression test on x86_64-linux-gnu are ok. regression tests on aarch64-linux-gnu and powerpc64-linux-gnu are also ok. Is it ok for trunk? Thanks, Wei. gcc/ChangeLog: 2015-01-21 Wei Mi * dse.c (record_store): Call get_addr for mem_addr. (check_mem_read_rtx): Likewise. Index: gcc/dse.c === --- gcc/dse.c (revision 219975) +++ gcc/dse.c (working copy) @@ -1575,6 +1575,7 @@ record_store (rtx body, bb_info_t bb_inf = rtx_group_vec[group_id]; mem_addr = group->canon_base_addr; } + mem_addr = get_addr (mem_addr); if (offset) mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); } @@ -2188,6 +2189,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t = rtx_group_vec[group_id]; mem_addr = group->canon_base_addr; } + mem_addr = get_addr (mem_addr); if (offset) mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); }
Re: [PATCH PR64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr
Thanks for the review. Comments addressed and patch committed. The problem exists on gcc-4_9 too. Is it ok for gcc-4_9-branch? Will wait another day to commit it to gcc-4_9 if it is ok. Thanks, Wei. On Thu, Jan 22, 2015 at 9:39 AM, Jeff Law wrote: > On 01/21/15 15:32, Wei Mi wrote: >> >> Hi, >> >> The patch is to address the bug here: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557 >> >> It is to call get_addr for VALUE before forming a mem_addr expr with >> the VALUE and an offset. This is to avoid the problem that get_addr >> can only handle VALUE but cannot handle an expr like: (VALUE + >> offset). With the fix, find_base_term can always get the base of the >> original addr. >> >> bootstrap and regression test on x86_64-linux-gnu are ok. regression >> tests on aarch64-linux-gnu and powerpc64-linux-gnu are also ok. Is it >> ok for trunk? >> >> Thanks, >> Wei. >> >> gcc/ChangeLog: >> >> 2015-01-21 Wei Mi >> >> * dse.c (record_store): Call get_addr for mem_addr. >> (check_mem_read_rtx): Likewise. > > Please add a PR marker to the ChangeLog entry. A testcase would be great, > but from reading the PR that doesn't seem possible without some heroic > efforts. > > OK with the PR marker and a comment before the two calls indicating why > those two calls are necessary. > > jeff >
[asan] change libasan to libsanitizer
Hi, Here is the patch to change libasan to libsanitizer and reorganize the directory. I divided the patch into three parts for review. patch.part1.txt: Contains the changes in the outermost level. patch.part2.txt.bz2: Remove libasan patch.part3.txt.bz2: Add libsanitizer Is it ok for asan branch? 2012-11-1 Wei Mi * configure.ac: Change target-libasan to target-libsanitizer. * configure.in: Regenerate. * Makefile.def: Change libasan module to libsanitizer. * Makefile.in: Regenerate. * libsanitizer: Change libasan to libsanitizer and add an empty tsan directory under libsanitizer. Thanks, Wei. Index: configure === --- configure (revision 193063) +++ configure (working copy) @@ -2771,7 +2771,7 @@ target_libraries="target-libgcc \ target-libitm \ target-libstdc++-v3 \ target-libmudflap \ - target-libasan \ + target-libsanitizer \ target-libssp \ target-libquadmath \ target-libgfortran \ Index: Makefile.in === --- Makefile.in (revision 193063) +++ Makefile.in (working copy) @@ -575,7 +575,7 @@ all: # This is the list of directories that may be needed in RPATH_ENVVAR # so that programs built for the target machine work. -TARGET_LIB_PATH = $(TARGET_LIB_PATH_libstdc++-v3)$(TARGET_LIB_PATH_libmudflap)$(TARGET_LIB_PATH_libasan)$(TARGET_LIB_PATH_libssp)$(TARGET_LIB_PATH_libgomp)$(TARGET_LIB_PATH_libitm)$(TARGET_LIB_PATH_libatomic)$(HOST_LIB_PATH_gcc) +TARGET_LIB_PATH = $(TARGET_LIB_PATH_libstdc++-v3)$(TARGET_LIB_PATH_libmudflap)$(TARGET_LIB_PATH_libsanitizer)$(TARGET_LIB_PATH_libssp)$(TARGET_LIB_PATH_libgomp)$(TARGET_LIB_PATH_libitm)$(TARGET_LIB_PATH_libatomic)$(HOST_LIB_PATH_gcc) @if target-libstdc++-v3 TARGET_LIB_PATH_libstdc++-v3 = $$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs: @@ -585,9 +585,9 @@ TARGET_LIB_PATH_libstdc++-v3 = $$r/$(TAR TARGET_LIB_PATH_libmudflap = $$r/$(TARGET_SUBDIR)/libmudflap/.libs: @endif target-libmudflap -@if target-libasan -TARGET_LIB_PATH_libasan = $$r/$(TARGET_SUBDIR)/libasan/.libs: -@endif target-libasan +@if target-libsanitizer +TARGET_LIB_PATH_libsanitizer = $$r/$(TARGET_SUBDIR)/libsanitizer/.libs: +@endif target-libsanitizer @if target-libssp TARGET_LIB_PATH_libssp = $$r/$(TARGET_SUBDIR)/libssp/.libs: @@ -924,7 +924,7 @@ configure-host: \ configure-target: \ maybe-configure-target-libstdc++-v3 \ maybe-configure-target-libmudflap \ -maybe-configure-target-libasan \ +maybe-configure-target-libsanitizer \ maybe-configure-target-libssp \ maybe-configure-target-newlib \ maybe-configure-target-libgcc \ @@ -1073,7 +1073,7 @@ all-host: maybe-all-lto-plugin all-target: maybe-all-target-libstdc++-v3 @endif target-libstdc++-v3-no-bootstrap all-target: maybe-all-target-libmudflap -all-target: maybe-all-target-libasan +all-target: maybe-all-target-libsanitizer all-target: maybe-all-target-libssp all-target: maybe-all-target-newlib @if target-libgcc-no-bootstrap @@ -1164,7 +1164,7 @@ info-host: maybe-info-lto-plugin info-target: maybe-info-target-libstdc++-v3 info-target: maybe-info-target-libmudflap -info-target: maybe-info-target-libasan +info-target: maybe-info-target-libsanitizer info-target: maybe-info-target-libssp info-target: maybe-info-target-newlib info-target: maybe-info-target-libgcc @@ -1246,7 +1246,7 @@ dvi-host: maybe-dvi-lto-plugin dvi-target: maybe-dvi-target-libstdc++-v3 dvi-target: maybe-dvi-target-libmudflap -dvi-target: maybe-dvi-target-libasan +dvi-target: maybe-dvi-target-libsanitizer dvi-target: maybe-dvi-target-libssp dvi-target: maybe-dvi-target-newlib dvi-target: maybe-dvi-target-libgcc @@ -1328,7 +1328,7 @@ pdf-host: maybe-pdf-lto-plugin pdf-target: maybe-pdf-target-libstdc++-v3 pdf-target: maybe-pdf-target-libmudflap -pdf-target: maybe-pdf-target-libasan +pdf-target: maybe-pdf-target-libsanitizer pdf-target: maybe-pdf-target-libssp pdf-target: maybe-pdf-target-newlib pdf-target: maybe-pdf-target-libgcc @@ -1410,7 +1410,7 @@ html-host: maybe-html-lto-plugin html-target: maybe-html-target-libstdc++-v3 html-target: maybe-html-target-libmudflap -html-target: maybe-html-target-libasan +html-target: maybe-html-target-libsanitizer html-target: maybe-html-target-libssp html-target: maybe-html-target-newlib html-target: maybe-html-target-libgcc @@ -1492,7 +1492,7 @@ TAGS-host: maybe-TAGS-lto-plugin TAGS-target: maybe-TAGS-target-libstdc++-v3 TAGS-target: maybe-TAGS-target-libmudflap -TAGS-target: maybe-TAGS-target-libasan +TAGS-target: maybe-TAGS-target-libsanitizer TAGS-target: maybe-TAGS-target-libssp TAGS-target: maybe-TAGS-target-newlib TAGS-target: maybe-TAGS-target-libgcc @@ -1574,7 +1574,7 @@ install-info-host: maybe-install-info-lt install-info-target: maybe-in
Re: [asan] change libasan to libsanitizer
Yes. That will be easier and clearer. The patch is too big. Thanks, Wei. On Thu, Nov 1, 2012 at 1:19 PM, Xinliang David Li wrote: > Will it be easier if you just rolled back your previous libasan > library changes, and resubmit it with the restructured directory? > > David > > On Thu, Nov 1, 2012 at 1:17 PM, Wei Mi wrote: >> patch.part2.txt.bz2 and patch.part3.txt.bz2 are still too big. >> >> Divide patch.part2.txt.bz2 into two parts: >> patch.part2-1.txt.bz2 + patch.part2-2.txt.bz2 >> >> Divide patch.part3.txt.bz2 into two parts: >> patch.part3-1.txt.bz2 + patch.part3-2.txt.bz2 >> >> This is patch.part2-1.txt.bz2. >> >> Thanks, >> Wei. >> >> On Thu, Nov 1, 2012 at 1:02 PM, Wei Mi wrote: >>> Hi, >>> >>> Here is the patch to change libasan to libsanitizer and reorganize the >>> directory. I divided the patch into three parts for review. >>> >>> patch.part1.txt: Contains the changes in the outermost level. >>> patch.part2.txt.bz2: Remove libasan >>> patch.part3.txt.bz2: Add libsanitizer >>> >>> Is it ok for asan branch? >>> >>> 2012-11-1 Wei Mi >>> >>> * configure.ac: Change target-libasan to target-libsanitizer. >>> * configure.in: Regenerate. >>> * Makefile.def: Change libasan module to libsanitizer. >>> * Makefile.in: Regenerate. >>> * libsanitizer: Change libasan to libsanitizer and add >>> an empty tsan directory under libsanitizer. >>> >>> Thanks, >>> Wei.
Re: [asan] change libasan to libsanitizer
Ok, I will check in the patch. Thanks, Wei. On Thu, Nov 1, 2012 at 3:03 PM, Xinliang David Li wrote: > On Thu, Nov 1, 2012 at 2:23 PM, Xinliang David Li wrote: >> On Thu, Nov 1, 2012 at 2:17 PM, Wei Mi wrote: >>> Thanks for the suggestion! >>> >>> The planned svn commands will be: >>> >>> svn mv libasan libsanitizer >>> svn add libsanitizer/asan >>> svn add libsanitizer/tsan >> >> Probably keep the tsan creation out of this patch. > > If there is no other objections, this patch is ok for asan branch with > the above. There might be errors spotted under trunk review, but that > should be fine .. > > thanks, > > David > >> >> David >> >>> cd libsanitizer >>> for i in `ls asan_*`; do >>> svn mv $i asan/$i >>> done >>> >>> Then apply the two patches attached on top of that. patch.1.txt is to >>> handle the toplevel configure and Makefile changes. patch.2.txt is to >>> handle the configure and Makefile changes in libsanitizer. >>> >>> Thanks, >>> Wei. >>> >>> On Thu, Nov 1, 2012 at 1:34 PM, Xinliang David Li >>> wrote: >>>> that sounds good to me. >>>> >>>> David >>>> >>>> On Thu, Nov 1, 2012 at 1:31 PM, Jakub Jelinek wrote: >>>>> On Thu, Nov 01, 2012 at 01:19:42PM -0700, Xinliang David Li wrote: >>>>>> Will it be easier if you just rolled back your previous libasan >>>>>> library changes, and resubmit it with the restructured directory? >>>>> >>>>> I think better would be if you didn't apply it as a patch with lots of svn >>>>> add/svn rm commands, but instead just svn mv the directory or files. >>>>> So it would be better if you could post the planned svn commands >>>>> and the patch that would be applied on top of that. >>>>> >>>>> Jakub
Re: [asan] change libasan to libsanitizer
The patch has been checked in. Committed revision 193074. Thanks, Wei. On Thu, Nov 1, 2012 at 3:16 PM, Wei Mi wrote: > Ok, I will check in the patch. > > Thanks, > Wei. > > On Thu, Nov 1, 2012 at 3:03 PM, Xinliang David Li wrote: >> On Thu, Nov 1, 2012 at 2:23 PM, Xinliang David Li wrote: >>> On Thu, Nov 1, 2012 at 2:17 PM, Wei Mi wrote: >>>> Thanks for the suggestion! >>>> >>>> The planned svn commands will be: >>>> >>>> svn mv libasan libsanitizer >>>> svn add libsanitizer/asan >>>> svn add libsanitizer/tsan >>> >>> Probably keep the tsan creation out of this patch. >> >> If there is no other objections, this patch is ok for asan branch with >> the above. There might be errors spotted under trunk review, but that >> should be fine .. >> >> thanks, >> >> David >> >>> >>> David >>> >>>> cd libsanitizer >>>> for i in `ls asan_*`; do >>>> svn mv $i asan/$i >>>> done >>>> >>>> Then apply the two patches attached on top of that. patch.1.txt is to >>>> handle the toplevel configure and Makefile changes. patch.2.txt is to >>>> handle the configure and Makefile changes in libsanitizer. >>>> >>>> Thanks, >>>> Wei. >>>> >>>> On Thu, Nov 1, 2012 at 1:34 PM, Xinliang David Li >>>> wrote: >>>>> that sounds good to me. >>>>> >>>>> David >>>>> >>>>> On Thu, Nov 1, 2012 at 1:31 PM, Jakub Jelinek wrote: >>>>>> On Thu, Nov 01, 2012 at 01:19:42PM -0700, Xinliang David Li wrote: >>>>>>> Will it be easier if you just rolled back your previous libasan >>>>>>> library changes, and resubmit it with the restructured directory? >>>>>> >>>>>> I think better would be if you didn't apply it as a patch with lots of >>>>>> svn >>>>>> add/svn rm commands, but instead just svn mv the directory or files. >>>>>> So it would be better if you could post the planned svn commands >>>>>> and the patch that would be applied on top of that. >>>>>> >>>>>> Jakub
Re: [tsan] ThreadSanitizer instrumentation part
Hi, Thanks for so many useful comments! I update the file according to the comments. The major changes include adding sanitizer.def and generating gimple directly. New patch file is attached. > On Wed, Oct 31, 2012 at 11:34:10AM -0700, Wei Mi wrote: >> gcc/ChangeLog: >> 2012-10-31 Wei Mi > > If Dmitry wrote parts of the patch, it would be nice to mention > him in the ChangeLog too. > All ChangeLog entries should end with a dot. Changed. 2012-10-31 Dmitry Vyukov Wei Mi * Makefile.in (tsan.o): New. (BUILTINS_DEF): Add sanitizer.def. * sanitizer.def: New. * passes.c (init_optimization_passes): Add tsan passes. * tree-pass.h (register_pass_info): Ditto. * cfghooks.h (GCC_CFGHOOKS_H): Avoid including duplicate headers. * doc/invoke.texi: Document tsan related options. * toplev.c (compile_file): Add tsan pass in driver. * gcc.c (LINK_COMMAND_SPEC): Add -lasan in link command if there -fthread_sanitizer is on. * tsan.c: New file about tsan. * tsan.h: Ditto. >> struct cfg_hooks >> @@ -219,3 +222,4 @@ extern void gimple_register_cfg_hooks (v >> extern struct cfg_hooks get_cfg_hooks (void); >> extern void set_cfg_hooks (struct cfg_hooks); >> >> +#endif /* GCC_CFGHOOKS_H */ > > Why this? Simply don't include that header in tsan.c, it is already > included by basic-block.h. Remove cfghooks.h from tsan.c. Remove the #ifdef GCC_CFGHOOKS_H from cfghooks.h > Can't google just assign the code to FSF, and use a standard boilerplate > as everything else in gcc/ ? Copy from asan header and make some change. >> +static tree >> +get_vptr_update_decl (void) >> +{ >> + tree typ; >> + static tree decl; >> + >> + if (decl != NULL) >> +return decl; >> + typ = build_function_type_list (void_type_node, >> + ptr_type_node, ptr_type_node, NULL_TREE); >> + decl = build_func_decl (typ, "__tsan_vptr_update"); >> + return decl; >> +} > ... > > Instead of this (but same applies to asan), I think we should just consider > putting it into builtins.def (or have sanitizer.def like there is sync.def > or omp-builtins.def). The problem might be non-C/C++ family frontends > though. Create sanitizer.def and use builtin_decl_implicit to create builtin decls. >> + while (TREE_CODE (expr_type) == ARRAY_TYPE) >> +expr_type = TREE_TYPE (expr_type); >> + size = (TREE_INT_CST_LOW (TYPE_SIZE (expr_type))) / BITS_PER_UNIT; > > int_size_in_bytes. Changed. > preferrably without building everything as trees, then gimplifying it. Generate gimple directly. Remove funcs: instr_memory_access, instr_vptr_update, instr_func_entry, instr_func_exit > For func_calls and func_mops, I believe why you need two variables instead > of just one, and why the function can't just return a bool whether > entry/exit needs to be instrumented or not. instrument_memory_accesses return a bool indicating whether or not entry/exit needs to be instrumented. func_calls and func_mops removed. >> +set_location (gimple_seq seq, location_t loc) >> +{ >> + gimple_seq_node n; >> + >> + for (n = gimple_seq_first (seq); n != NULL; n = n->gsbase.next) > > This really should use a stmt iterator. set_location removed. set gimple location using gimple_set_location everytime a new gimple statement is inserted. >> + FOR_EACH_BB (bb) >> +{ >> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> +{ >> + instrument_gimple (gsi); >> +} >> +} > > Extraneous two pairs of {}s. Fixed. >> +struct gimple_opt_pass pass_tsan = {{ > > Please watch formatting of other gimple_opt_pass structures. > {{ isn't used anywhere. Fixed. > Is that the option that LLVM uses (I'm talking about -faddress-sanitizer > in LLVM vs. -fasan right now in GCC, isn't that similar?). Fixed. > +static tree > +get_init_decl (void) > +{ > + tree typ; > + static tree decl; > + > + if (decl != NULL) > +return decl; > + typ = build_function_type_list (void_type_node, NULL_TREE); > + decl = build_func_decl (typ, "__tsan_init"); > + return decl; > +} > > The above can crash the compiler btw, as that static tree decl > (in many other functions) is not GTY(()) marked (must be file scope for > that), thus ggc_collect might free it. Also, please use type > instead of typ for variable names. Func get_init_decl removed after generating gimple directly. >> + /* Instrumentation for assignment of a function result >> + must be inserted after the c
Re: [tsan] ThreadSanitizer instrumentation part
Sorry, I attached an incorrect patch.txt yesterday. This is the correct one. Thanks, Wei. On Fri, Nov 2, 2012 at 6:31 PM, Wei Mi wrote: > Hi, > > Thanks for so many useful comments! I update the file according to the > comments. The major changes include adding sanitizer.def and > generating gimple directly. New patch file is attached. > >> On Wed, Oct 31, 2012 at 11:34:10AM -0700, Wei Mi wrote: >>> gcc/ChangeLog: >>> 2012-10-31 Wei Mi >> >> If Dmitry wrote parts of the patch, it would be nice to mention >> him in the ChangeLog too. > >> All ChangeLog entries should end with a dot. > > Changed. > > 2012-10-31 Dmitry Vyukov > Wei Mi > > * Makefile.in (tsan.o): New. > (BUILTINS_DEF): Add sanitizer.def. > * sanitizer.def: New. > * passes.c (init_optimization_passes): Add tsan passes. > * tree-pass.h (register_pass_info): Ditto. > * cfghooks.h (GCC_CFGHOOKS_H): Avoid including duplicate headers. > * doc/invoke.texi: Document tsan related options. > * toplev.c (compile_file): Add tsan pass in driver. > * gcc.c (LINK_COMMAND_SPEC): Add -lasan in link command if there > -fthread_sanitizer is on. > * tsan.c: New file about tsan. > * tsan.h: Ditto. > > >>> struct cfg_hooks >>> @@ -219,3 +222,4 @@ extern void gimple_register_cfg_hooks (v >>> extern struct cfg_hooks get_cfg_hooks (void); >>> extern void set_cfg_hooks (struct cfg_hooks); >>> >>> +#endif /* GCC_CFGHOOKS_H */ >> >> Why this? Simply don't include that header in tsan.c, it is already >> included by basic-block.h. > > Remove cfghooks.h from tsan.c. Remove the #ifdef GCC_CFGHOOKS_H from > cfghooks.h > >> Can't google just assign the code to FSF, and use a standard boilerplate >> as everything else in gcc/ ? > > Copy from asan header and make some change. > >>> +static tree >>> +get_vptr_update_decl (void) >>> +{ >>> + tree typ; >>> + static tree decl; >>> + >>> + if (decl != NULL) >>> +return decl; >>> + typ = build_function_type_list (void_type_node, >>> + ptr_type_node, ptr_type_node, NULL_TREE); >>> + decl = build_func_decl (typ, "__tsan_vptr_update"); >>> + return decl; >>> +} >> ... >> >> Instead of this (but same applies to asan), I think we should just consider >> putting it into builtins.def (or have sanitizer.def like there is sync.def >> or omp-builtins.def). The problem might be non-C/C++ family frontends >> though. > > Create sanitizer.def and use builtin_decl_implicit to create builtin decls. > >>> + while (TREE_CODE (expr_type) == ARRAY_TYPE) >>> +expr_type = TREE_TYPE (expr_type); >>> + size = (TREE_INT_CST_LOW (TYPE_SIZE (expr_type))) / BITS_PER_UNIT; >> >> int_size_in_bytes. > > Changed. > >> preferrably without building everything as trees, then gimplifying it. > > Generate gimple directly. Remove funcs: instr_memory_access, > instr_vptr_update, instr_func_entry, instr_func_exit > >> For func_calls and func_mops, I believe why you need two variables instead >> of just one, and why the function can't just return a bool whether >> entry/exit needs to be instrumented or not. > > instrument_memory_accesses return a bool indicating whether or not > entry/exit needs to be instrumented. func_calls and func_mops removed. > >>> +set_location (gimple_seq seq, location_t loc) >>> +{ >>> + gimple_seq_node n; >>> + >>> + for (n = gimple_seq_first (seq); n != NULL; n = n->gsbase.next) >> >> This really should use a stmt iterator. > > set_location removed. set gimple location using gimple_set_location > everytime a new gimple statement is inserted. > >>> + FOR_EACH_BB (bb) >>> +{ >>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>> +{ >>> + instrument_gimple (gsi); >>> +} >>> +} >> >> Extraneous two pairs of {}s. > > Fixed. > >>> +struct gimple_opt_pass pass_tsan = {{ >> >> Please watch formatting of other gimple_opt_pass structures. >> {{ isn't used anywhere. > > Fixed. > >> Is that the option that LLVM uses (I'm talking about -faddress-sanitizer >> in LLVM vs. -fasan right now in GCC, isn't that similar?). > > Fixed. > >> +static tree >> +get_init
Re: [tsan] ThreadSanitizer instrumentation part
Hi Jakub, Thanks for the comments. I fix most of them except the setting of TODO_ The new patch.txt is attached. Thanks, Wei. >> + TODO_verify_all | TODO_update_ssa > > Ideally you shouldn't need TODO_update_ssa. > I got error when I removed TODO_update_ssa, so I kept it. >> +| TODO_update_address_taken /* todo_flags_finish */ > > And why this? > If we generate tsan_read(&a) for a non-address taken static variable a, we need to change a to be address taken, right? On Sat, Nov 3, 2012 at 11:39 AM, Jakub Jelinek wrote: > On Sat, Nov 03, 2012 at 10:05:35AM -0700, Wei Mi wrote: >> --- gcc/sanitizer.def (revision 0) >> +++ gcc/sanitizer.def (revision 0) >> @@ -0,0 +1,31 @@ >> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRITE_16, "__tsan_write16", >> + BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) >> + >> + >> + > > Please remove the trailing whitespace. Done > >> +/* Builtin used by the implementation of libsanitizer. These >> + functions are mapped to the actual implementation of the >> + libasan and libtsan library. */ >> +#undef DEF_SANITIZER_BUILTIN >> +#define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ >> + DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ >> + true, true, true, ATTRS, true, flag_tsan) > > That should be eventually flag_asan || flag_tsan, as sanitizer.def > should be also for asan builtins, or it must be DEF_TSAN_BUILTIN/tsan.def. > Postpone to fix it after asan checkin to trunk. >> +static tree >> +get_memory_access_decl (bool is_write, unsigned size) >> +{ >> + enum built_in_function fcode; >> + >> + if (size <= 1) >> +fcode = is_write ? BUILT_IN_TSAN_WRITE_1 : >> + BUILT_IN_TSAN_READ_1; > > Formatting, : should be below ?. Fixed. >> + >> + return builtin_decl_implicit(fcode); > > Space before (. Several times in the code. > Fixed. > Also, as is the tsan builtins will be defined only for > C/C++ family FEs, so either something needs to be done > for other FEs, or perhaps the pass should just error out > if say the BUILT_IN_TSAN_INIT isn't defined. > Wrap builtin_decl_implicit in get_tsan_builtin_decl. If builtin_decl_implicit return invalid decl, output error message and then exit. >> +static tree >> +is_vptr_store (gimple stmt, tree expr, int is_write) > > is_write should be bool, > >> +{ >> + if (is_write == 1 > > and this just is_write > >> +static bool >> +is_load_of_const_p (tree expr, int is_write) >> +{ >> + if (is_write) >> +return false; > > Again. > Fixed >> + /* The var does not live in memory -> no possibility of races. */ >> + || (tcode == VAR_DECL >> + && !TREE_ADDRESSABLE (expr) >> + && TREE_STATIC (expr) == 0) > > Please use && !is_global_var (expr) here instead. > Changed. >> + /* TODO: handle other cases >> + (FIELD_DECL, MEM_REF, ARRAY_RANGE_REF, TARGET_MEM_REF, ADDR_EXPR). */ > > The comment is obsolete, MEM_REF is handled. > Fixed. >> + if (tcode != ARRAY_REF >> + && tcode != VAR_DECL >> + && tcode != COMPONENT_REF >> + && tcode != INDIRECT_REF >> + && tcode != MEM_REF) >> +return false; >> + >> + stmt = gsi_stmt (gsi); >> + loc = gimple_location (stmt); >> + rhs = is_vptr_store (stmt, expr, is_write); >> +#ifdef DEBUG >> + if (rhs == NULL) >> +gcc_assert (is_gimple_addressable (expr)); >> +#endif > > That should be > gcc_checking_assert (rhs != NULL || is_gimple_addressable (expr)); > if you want to check it in checking versions only. > Fixed. >> + size = int_size_in_bytes(expr_type); > > Missing space. > Fixed. >> + g = gimple_build_call( >> +get_memory_access_decl(is_write, size), >> +1, expr_ptr); > > And the formatting here is completely wrong. > Fixed. >> +} >> + else >> +g = gimple_build_call( >> + builtin_decl_implicit(BUILT_IN_TSAN_VPTR_UPDATE), >> + 1, expr_ptr); >> + gimple_set_location (g, loc); >> + /* Instrumentation for assignment of a function result >> + must be inserted after the call. Instrumentation for >> + reads of function arguments must be inserted before the call. >> + That's because the call can contain synchronization. */ >> + if (is_gimple_call (stmt) && is_write) >
Re: [PATCH 01/10] Initial import of asan from the Google branch into trunk
> Other issues: > > * libasan does not seem to be a multilib, at least I only find the 64bit > version on x86-64-gnu-linux such that "-m32" compilation fails. > That is because originally configure file is shared between asan and tsan (tsan doesn't support 32 bit). Diego has suggested me to split the configure, so we will send a patch to support 32bit version asan after Dodji's patches checkin to trunk. Thanks, Wei.
Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)
> >> >> > 2. Large Gtest-based unittest. This is a set of c++ files that should >> > be built with the asan switch, depends on gtest >> > (http://code.google.com/p/googletest/). >> > >> > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/asan/tests/asan_test.cc?revision=166104&view=markup >> > This should be easy to port to GCC, but it requires gtest. >> >> I don't think we want to depend on gtest, if the tests only use a small >> subset of that, it would be better to just use some header with macros >> compatible with that for assertions and the like. > > > We use a large and heavy subset of gtest, namely: death tests. > It can't be easily replaced with "some header with macros". > gtest integrate multiple tests into the same file with each test being a single line check. I cannot think out a method to migrate it to dejagnu without using gtest, except splitting a single gtest file to multiple files with each file per test. asan has about 130 tests so have to write 130 files which will be a doable but painful task. >> >> >> > 3. Full output tests (a .cc file should be build with asan switch, >> > executable should be run and the stderr is compared with the expected >> > output) >> > Example: >> > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/asan/lit_tests/stack-overflow.cc?revision=165391&view=markup >> > The can be ported to GCC, but the uses of FileCheck >> > (http://llvm.org/docs/CommandGuide/FileCheck.html) will need to be >> > replaced with GCC's analog. >> > We should probably start with these tests. >> >> Dejagnu in GCC has >> >> ! { dg-do run } >> ! { dg-options "-fbounds-check" } >> ! { dg-shouldfail "Duplicate value 2 in ORDER argument to RESHAPE >> intrinsic" } >> program main >> implicit none >> integer(kind=1), dimension(6) :: source1 = (/ 1, 2, 3, 4, 5, 6 /) >> integer, dimension(2) :: shape1 = (/ 2, 3/) >> integer(kind=1), dimension(2) :: pad1 = (/ 0, 0/) >> character(len=200) :: l1, l2 >> integer :: i1, i2 >> >> l1 = "2 2" >> read(unit=l1,fmt=*) i1, i2 >> write (unit=l2,fmt=*) reshape(source1, shape1, pad1, (/i1, i2/)) ! >> Invalid >> end program main >> ! { dg-output "Fortran runtime error: Duplicate value 2 in ORDER argument >> to RESHAPE intrinsic" } >> >> style markings, dg-shouldfail says that the program is expected to fail >> rather than pass (if it aborts), and dg-output (perhaps multiple) can >> contain regexps to match against stderr + stdout joined. Haven't looked >> at the asan tests yet, do you expect just one ASAN abort per test, > > > These tests do just one abort (actually, _exit(1)) per test. > Let's start with these. > > --kcc > I will start with this. Thanks, Wei.
Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)
On Fri, Nov 9, 2012 at 11:13 AM, Jakub Jelinek wrote: > On Fri, Nov 09, 2012 at 11:05:37AM -0800, Wei Mi wrote: >> gtest integrate multiple tests into the same file with each test being >> a single line check. I cannot think out a method to migrate it to >> dejagnu without using gtest, except splitting a single gtest file to >> multiple files with each file per test. asan has about 130 tests so >> have to write 130 files which will be a doable but painful task. > > See the glibc _FORTIFY_SOURCE check I've referenced, there it is 3 lines > per test expected to crash, but could be done in a single macro too. > If the failure can be intercepted, it can be done in a single process (e.g. > SIGABRT can, _exit can't that easily), otherwise perhaps say way to skip > previous tests and communicate with dejagnu how many times it should run the > executable. > > Jakub Using setjmp/longjmp to do multiple tests in a single testfile, the test statements in the front could affect the tests in the back. gtest will fork a new process for every test statement. The forked process will do only one test and skip all the other test statements. That is to say, multiple test statements in the same testfile are guaranteed to be independent from each other in gtest. If we use setjmp/longjmp pattern to do the test, existing testsuite may need to be rewritten if their test statements could affect each other. Thanks, Wei.
Re: [tsan] ThreadSanitizer instrumentation part
For TODO_update_ssa, when we insert tsan_write(&a), current function's ssa_renaming_needed flag will be set in finalize_ssa_defs because we insert non-ssaname vdef. An assertion in execute_todo will check whether we have TODO_update_ssa set when current function's ssa_renaming_needed flag is set. That is why I will get assertion when I remove TODO_update_ssa flag. Is it ok to keep TODO_update_ssa and TODO_update_address_taken? Thanks, Wei. On Mon, Nov 5, 2012 at 4:37 PM, Wei Mi wrote: > Hi Jakub, > > Thanks for the comments. I fix most of them except the setting of > TODO_ The new patch.txt is attached. > > Thanks, > Wei. > >>> + TODO_verify_all | TODO_update_ssa >> >> Ideally you shouldn't need TODO_update_ssa. >> > > I got error when I removed TODO_update_ssa, so I kept it. > >>> +| TODO_update_address_taken /* todo_flags_finish */ >> >> And why this? >> > > If we generate tsan_read(&a) for a non-address taken static variable > a, we need to change a to be address taken, right? > > On Sat, Nov 3, 2012 at 11:39 AM, Jakub Jelinek wrote: >> On Sat, Nov 03, 2012 at 10:05:35AM -0700, Wei Mi wrote: >>> --- gcc/sanitizer.def (revision 0) >>> +++ gcc/sanitizer.def (revision 0) >>> @@ -0,0 +1,31 @@ >>> +DEF_SANITIZER_BUILTIN(BUILT_IN_TSAN_WRITE_16, "__tsan_write16", >>> + BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST) >>> + >>> + >>> + >> >> Please remove the trailing whitespace. > > Done > >> >>> +/* Builtin used by the implementation of libsanitizer. These >>> + functions are mapped to the actual implementation of the >>> + libasan and libtsan library. */ >>> +#undef DEF_SANITIZER_BUILTIN >>> +#define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ >>> + DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ >>> + true, true, true, ATTRS, true, flag_tsan) >> >> That should be eventually flag_asan || flag_tsan, as sanitizer.def >> should be also for asan builtins, or it must be DEF_TSAN_BUILTIN/tsan.def. >> > > Postpone to fix it after asan checkin to trunk. > >>> +static tree >>> +get_memory_access_decl (bool is_write, unsigned size) >>> +{ >>> + enum built_in_function fcode; >>> + >>> + if (size <= 1) >>> +fcode = is_write ? BUILT_IN_TSAN_WRITE_1 : >>> + BUILT_IN_TSAN_READ_1; >> >> Formatting, : should be below ?. > > Fixed. > >>> + >>> + return builtin_decl_implicit(fcode); >> >> Space before (. Several times in the code. >> > > Fixed. > >> Also, as is the tsan builtins will be defined only for >> C/C++ family FEs, so either something needs to be done >> for other FEs, or perhaps the pass should just error out >> if say the BUILT_IN_TSAN_INIT isn't defined. >> > > Wrap builtin_decl_implicit in get_tsan_builtin_decl. If > builtin_decl_implicit return invalid decl, output error message and > then exit. > >>> +static tree >>> +is_vptr_store (gimple stmt, tree expr, int is_write) >> >> is_write should be bool, >> >>> +{ >>> + if (is_write == 1 >> >> and this just is_write >> >>> +static bool >>> +is_load_of_const_p (tree expr, int is_write) >>> +{ >>> + if (is_write) >>> +return false; >> >> Again. >> > > Fixed > >>> + /* The var does not live in memory -> no possibility of races. */ >>> + || (tcode == VAR_DECL >>> + && !TREE_ADDRESSABLE (expr) >>> + && TREE_STATIC (expr) == 0) >> >> Please use && !is_global_var (expr) here instead. >> > > Changed. > >>> + /* TODO: handle other cases >>> + (FIELD_DECL, MEM_REF, ARRAY_RANGE_REF, TARGET_MEM_REF, ADDR_EXPR). */ >> >> The comment is obsolete, MEM_REF is handled. >> > > Fixed. > >>> + if (tcode != ARRAY_REF >>> + && tcode != VAR_DECL >>> + && tcode != COMPONENT_REF >>> + && tcode != INDIRECT_REF >>> + && tcode != MEM_REF) >>> +return false; >>> + >>> + stmt = gsi_stmt (gsi); >>> + loc = gimple_location (stmt); >>> + rhs = is_vptr_store (stmt, expr, is_write); >>> +#ifdef DEBUG >>> + if (rhs == NULL) >>> +gcc_assert (is_gimple_addressable (expr)); >&g
Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)
>> > 3. Full output tests (a .cc file should be build with asan switch, >> > executable should be run and the stderr is compared with the expected >> > output) >> > Example: >> > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/asan/lit_tests/stack-overflow.cc?revision=165391&view=markup >> > The can be ported to GCC, but the uses of FileCheck >> > (http://llvm.org/docs/CommandGuide/FileCheck.html) will need to be >> > replaced with GCC's analog. >> > We should probably start with these tests. >> >> Dejagnu in GCC has >> >> ! { dg-do run } >> ! { dg-options "-fbounds-check" } >> ! { dg-shouldfail "Duplicate value 2 in ORDER argument to RESHAPE >> intrinsic" } >> program main >> implicit none >> integer(kind=1), dimension(6) :: source1 = (/ 1, 2, 3, 4, 5, 6 /) >> integer, dimension(2) :: shape1 = (/ 2, 3/) >> integer(kind=1), dimension(2) :: pad1 = (/ 0, 0/) >> character(len=200) :: l1, l2 >> integer :: i1, i2 >> >> l1 = "2 2" >> read(unit=l1,fmt=*) i1, i2 >> write (unit=l2,fmt=*) reshape(source1, shape1, pad1, (/i1, i2/)) ! >> Invalid >> end program main >> ! { dg-output "Fortran runtime error: Duplicate value 2 in ORDER argument >> to RESHAPE intrinsic" } >> >> style markings, dg-shouldfail says that the program is expected to fail >> rather than pass (if it aborts), and dg-output (perhaps multiple) can >> contain regexps to match against stderr + stdout joined. Haven't looked >> at the asan tests yet, do you expect just one ASAN abort per test, > > > These tests do just one abort (actually, _exit(1)) per test. > Let's start with these. > > --kcc > I migrate a test in the third category. Please help to check whether it is ok. Then I will migrate the left. The files added are as follows and attached. (Please forgive I use -fasan in asan.exp because I use an old repository to try the migration) gcc/testsuite/lib/asan-dg.exp: copy from libmudflap/lib/mfdg.exp, it rewrites the proc dg-test, to enable dg-output check in xfail status. Existing dg-output check in /usr/share/dejagnu/dg.exp only work in pass status but not in xfail status. gcc/testsuite/g++.dg/asan/asan.exp gcc/testsuite/g++.dg/asan/memcmp_test.cc A problem: llvm provides llvm-symbolizer and asan_symbolize.py to map virtual address to its parent func name, which is used in "CHECK: {{#0.*memcmp}}" in llvm test below. I don't know whether gcc provides similar tools. How to deal with it? memcmp_test.cc in llvm: 1 // RUN: %clangxx_asan -m64 -O0 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 2 // RUN: %clangxx_asan -m64 -O1 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 3 // RUN: %clangxx_asan -m64 -O2 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 4 // RUN: %clangxx_asan -m64 -O3 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 5 // RUN: %clangxx_asan -m32 -O0 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 6 // RUN: %clangxx_asan -m32 -O1 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 7 // RUN: %clangxx_asan -m32 -O2 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 8 // RUN: %clangxx_asan -m32 -O3 %s -o %t %lib && %t 2>&1 | %symbolize | FileCheck %s 9 10 #include 11 int main(int argc, char **argv) { 12 char a1[] = {argc, 2, 3, 4}; 13 char a2[] = {1, 2*argc, 3, 4}; 14 int res = memcmp(a1, a2, 4 + argc); // BOOM 15 // CHECK: AddressSanitizer stack-buffer-overflow 16 // CHECK: {{#0.*memcmp}} 17 // CHECK: {{#1.*main}} 18 return res; 19 } memcmp_test.cc planed for gcc: 1 #include 2 int main(int argc, char **argv) { 3 char a1[] = {argc, 2, 3, 4}; 4 char a2[] = {1, 2*argc, 3, 4}; 5 int res = memcmp(a1, a2, 4 + argc); // BOOM 6 return res; 7 } 8 9 /* { dg-output "AddressSanitizer stack-buffer-overflow.*" } */ 10 /* { dg-do run { xfail *-*-* } } */ 11 Thanks, Wei. asan-dg.exp Description: Binary data asan.exp Description: Binary data memcmp_test.cc Description: Binary data
Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)
> Will this run the test in all mode (O0 vs O2 and -m32 vs -m64)? > > > --kcc Yes, it runs in all modes. O0/O1/O2/O3 + -m32/-m64, which is specified in asan.exp. Thanks, Wei.
Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)
Sure, I can add the support. To run -mx32 test, we need to add x32 multilib support for libasan first, right? Thanks, Wei. On Tue, Nov 13, 2012 at 2:14 PM, H.J. Lu wrote: > On Tue, Nov 13, 2012 at 1:49 PM, Wei Mi wrote: >>> Will this run the test in all mode (O0 vs O2 and -m32 vs -m64)? >>> >>> >>> --kcc >> >> Yes, it runs in all modes. O0/O1/O2/O3 + -m32/-m64, which is specified >> in asan.exp. >> > > Does it support -mx32? > > > -- > H.J.
Re: [tsan] ThreadSanitizer instrumentation part
Thanks for catching this. I update the patch. Regards, Wei. On Tue, Nov 13, 2012 at 4:54 PM, Richard Henderson wrote: > On 11/13/2012 04:08 PM, Wei Mi wrote: >> +extern void tsan_finish_file (void); >> + >> +#endif /* TREE_TSAN */ >> +/* ThreadSanitizer, a data race detector. >> + Copyright (C) 2011 Free Software Foundation, Inc. >> + Contributed by Dmitry Vyukov > > Careful, you've got double applied patches there. > > > r~ Index: gcc/passes.c === --- gcc/passes.c(revision 193482) +++ gcc/passes.c(working copy) @@ -1457,6 +1457,7 @@ init_optimization_passes (void) NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); NEXT_PASS (pass_asan); + NEXT_PASS (pass_tsan); NEXT_PASS (pass_tree_loop); { struct opt_pass **p = &pass_tree_loop.pass.sub; @@ -1563,6 +1564,7 @@ init_optimization_passes (void) } NEXT_PASS (pass_lower_complex_O0); NEXT_PASS (pass_asan_O0); + NEXT_PASS (pass_tsan_O0); NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_lower_resx); NEXT_PASS (pass_nrv); Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 193490) +++ gcc/doc/invoke.texi (working copy) @@ -289,7 +289,8 @@ Objective-C and Objective-C++ Dialects}. @item Debugging Options @xref{Debugging Options,,Options for Debugging Your Program or GCC}. @gccoptlist{-d@var{letters} -dumpspecs -dumpmachine -dumpversion @gol --faddress-sanitizer -fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol +-fsanitize=@var{style} @gol +-fdbg-cnt-list -fdbg-cnt=@var{counter-value-list} @gol -fdisable-ipa-@var{pass_name} @gol -fdisable-rtl-@var{pass_name} @gol -fdisable-rtl-@var{pass-name}=@var{range-list} @gol @@ -309,6 +310,7 @@ Objective-C and Objective-C++ Dialects}. -fdump-tree-ssa@r{[}-@var{n}@r{]} -fdump-tree-pre@r{[}-@var{n}@r{]} @gol -fdump-tree-ccp@r{[}-@var{n}@r{]} -fdump-tree-dce@r{[}-@var{n}@r{]} @gol -fdump-tree-gimple@r{[}-raw@r{]} -fdump-tree-mudflap@r{[}-@var{n}@r{]} @gol +-fdump-tree-tsan@r{[}-@var{n}@r{]} @gol -fdump-tree-dom@r{[}-@var{n}@r{]} @gol -fdump-tree-dse@r{[}-@var{n}@r{]} @gol -fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol @@ -382,8 +384,8 @@ Objective-C and Objective-C++ Dialects}. -floop-parallelize-all -flto -flto-compression-level @gol -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol --fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol --fno-default-inline @gol +-fmove-loop-invariants -fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol +-fthread-sanitizer-ignore -fno-default-inline @gol -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol @@ -5983,6 +5985,11 @@ appending @file{.dce} to the source file Dump each function after adding mudflap instrumentation. The file name is made by appending @file{.mudflap} to the source file name. +@item tsan +@opindex fdump-tree-tsan +Dump each function after adding ThreadSanitizer instrumentation. The file name is +made by appending @file{.tsan} to the source file name. + @item sra @opindex fdump-tree-sra Dump each function after performing scalar replacement of aggregates. The @@ -6849,11 +6856,14 @@ assumptions based on that. The default is @option{-fzero-initialized-in-bss}. -@item -faddress-sanitizer -Enable AddressSanitizer, a fast memory error detector. -Memory access instructions will be instrumented to detect -out-of-bounds and use-after-free bugs. So far only heap bugs will be detected. -See @uref{http://code.google.com/p/address-sanitizer/} for more details. +@item -fsanitize=[address|thread] +Enable AddressSanitizer or ThreadSanitizer. AddressSanitizer is a fast +memory error detector. Memory access instructions will be instrumented +to detect out-of-bounds, use-after-free, stack overflow and global +overflow bugs. ThreadSanitizer is a fast data race detector. +See @uref{http://code.google.com/p/address-sanitizer/} and +@uref{http://code.google.com/p/data-race-test/wiki/ThreadSanitizer} +for more details. @item -fmudflap -fmudflapth -fmudflapir @opindex fmudflap @@ -6881,6 +6891,11 @@ instrumentation (and therefore faster ex some protection against outright memory corrupting writes, but allows erroneously read data to propagate within a program. +@item -fthread-sanitizer-ignore +@opindex fthread-sanitizer-ignore +Add ThreadSanitizer instrumentation. Use @option{-fthread-sanitizer-ignore} to specify +an ignore file. Refer to http://go/tsan for details. + @item -fthread-jumps @opindex fthread-jumps Perform optimizati
Re: [tsan] ThreadSanitizer instrumentation part
Hi, Is it ok for the trunk? Thanks, Wei. On Tue, Nov 13, 2012 at 5:06 PM, Wei Mi wrote: > Thanks for catching this. I update the patch. > > Regards, > Wei. > > On Tue, Nov 13, 2012 at 4:54 PM, Richard Henderson wrote: >> On 11/13/2012 04:08 PM, Wei Mi wrote: >>> +extern void tsan_finish_file (void); >>> + >>> +#endif /* TREE_TSAN */ >>> +/* ThreadSanitizer, a data race detector. >>> + Copyright (C) 2011 Free Software Foundation, Inc. >>> + Contributed by Dmitry Vyukov >> >> Careful, you've got double applied patches there. >> >> >> r~
[PATCH] Change -faddress-sanitizer to -fsanitize=address
Hi, This patch is to change -faddress-sanitizer to -fsanitize=address. Ok for trunk? 2012-11-19 Wei Mi * cfgexpand.c (partition_stack_vars): Change flag_asan to flag_sanitize. (expand_stack_vars): Likewise. (defer_stack_allocation): Likewise. (expand_used_vars): Likewise. * varasm.c (assemble_noswitch_variable): Likewise. (assemble_variable): Likewise. (place_block_symbol): Likewise. * asan.c (gate_asan): Likewise. (gate_asan_O0): Likewise. * toplev.c (compile_file): Likewise. (process_options): Likewise. * common.opt: Change faddress-sanitizer to fsanitize=address. * gcc.c (LINK_COMMAND_SPEC): Likewise. * testsuite/lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Likewise. (asan_init): Likewise. * flag-types.h (sanitize_type): New enum type. * doc/invoke.texi (-fsanitize=[address|thread]): Document. Thanks, Wei. Index: gcc/cfgexpand.c === --- gcc/cfgexpand.c (revision 193614) +++ gcc/cfgexpand.c (working copy) @@ -765,7 +765,7 @@ partition_stack_vars (void) sizes, as the shorter vars wouldn't be adequately protected. Don't do that for "large" (unsupported) alignment objects, those aren't protected anyway. */ - if (flag_asan && isize != jsize + if (flag_sanitize == SANITIZE_ADDRESS && isize != jsize && ialign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT) break; @@ -941,7 +941,7 @@ expand_stack_vars (bool (*pred) (size_t) alignb = stack_vars[i].alignb; if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT) { - if (flag_asan && pred) + if (flag_sanitize == SANITIZE_ADDRESS && pred) { HOST_WIDE_INT prev_offset = frame_offset; tree repr_decl = NULL_TREE; @@ -,7 +,7 @@ defer_stack_allocation (tree var, bool t /* If stack protection is enabled, *all* stack variables must be deferred, so that we can re-order the strings to the top of the frame. Similarly for Address Sanitizer. */ - if (flag_stack_protect || flag_asan) + if (flag_stack_protect || flag_sanitize == SANITIZE_ADDRESS) return true; /* We handle "large" alignment via dynamic allocation. We want to handle @@ -1693,7 +1693,7 @@ expand_used_vars (void) expand_stack_vars (stack_protect_decl_phase_2, &data); } - if (flag_asan) + if (flag_sanitize == SANITIZE_ADDRESS) /* Phase 3, any partitions that need asan protection in addition to phase 1 and 2. */ expand_stack_vars (asan_decl_phase_3, &data); Index: gcc/testsuite/lib/asan-dg.exp === --- gcc/testsuite/lib/asan-dg.exp (revision 193614) +++ gcc/testsuite/lib/asan-dg.exp (working copy) @@ -20,7 +20,7 @@ proc check_effective_target_faddress_sanitizer {} { return [check_no_compiler_messages faddress_sanitizer object { void foo (void) { } -} "-faddress-sanitizer"] +} "-fsanitize=address"] } # @@ -83,12 +83,12 @@ proc asan_init { args } { } if [info exists ALWAYS_CXXFLAGS] { set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS] - set ALWAYS_CXXFLAGS [concat "{additional_flags=-faddress-sanitizer -g}" $ALWAYS_CXXFLAGS] + set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=address -g}" $ALWAYS_CXXFLAGS] } else { if [info exists TEST_ALWAYS_FLAGS] { - set TEST_ALWAYS_FLAGS "$link_flags -faddress-sanitizer -g $TEST_ALWAYS_FLAGS" + set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=address -g $TEST_ALWAYS_FLAGS" } else { - set TEST_ALWAYS_FLAGS "$link_flags -faddress-sanitizer -g" + set TEST_ALWAYS_FLAGS "$link_flags -fsanitize=address -g" } } } Index: gcc/varasm.c === --- gcc/varasm.c(revision 193614) +++ gcc/varasm.c(working copy) @@ -1832,7 +1832,7 @@ assemble_noswitch_variable (tree decl, c size = tree_low_cst (DECL_SIZE_UNIT (decl), 1); rounded = size; - if (flag_asan && asan_protect_global (decl)) + if (flag_sanitize == SANITIZE_ADDRESS && asan_protect_global (decl)) size += asan_red_zone_size (size); /* Don't allocate zero bytes of common, @@ -1990,7 +1990,7 @@ assemble_variable (tree decl, int top_le align_variable (decl, dont_output_data); - if (flag_asan + if (flag_sanitize == SANITIZE_ADDRESS && asan_protect_global (decl)) { asan_protected = true;
Re: [PATCH] Change -faddress-sanitizer to -fsanitize=address
I cannot remove RejectNegative and use -fno-sanitize=address, or else I will break an assertion (opts-common.c:614). The assertion requires -fxxx=var options set RejectNegative if var is of enumerater type. I see that all the other -fxxx=xxx options in common.opt set RejectNegative. Is it ok for me to checkin the current patch and submit another patch if there is a better way to do it? Thanks, Wei. On Mon, Nov 19, 2012 at 10:31 AM, Xinliang David Li wrote: > Questions: are -fsanitize=thread -fsanitize=address mutually exclusive > here? If yes, that will be wrong. > > How about -fsanitize=all option? > > As kcc mentioned, the -fno-.. form is not handled. > > David > > > On Mon, Nov 19, 2012 at 10:14 AM, Wei Mi wrote: >> Hi, >> >> This patch is to change -faddress-sanitizer to -fsanitize=address. Ok for >> trunk? >> >> 2012-11-19 Wei Mi >> >> * cfgexpand.c (partition_stack_vars): Change flag_asan to >> flag_sanitize. >> (expand_stack_vars): Likewise. >> (defer_stack_allocation): Likewise. >> (expand_used_vars): Likewise. >> * varasm.c (assemble_noswitch_variable): Likewise. >> (assemble_variable): Likewise. >> (place_block_symbol): Likewise. >> * asan.c (gate_asan): Likewise. >> (gate_asan_O0): Likewise. >> * toplev.c (compile_file): Likewise. >> (process_options): Likewise. >> * common.opt: Change faddress-sanitizer to fsanitize=address. >> * gcc.c (LINK_COMMAND_SPEC): Likewise. >> * testsuite/lib/asan-dg.exp >> (check_effective_target_faddress_sanitizer): Likewise. >> (asan_init): Likewise. >> * flag-types.h (sanitize_type): New enum type. >> * doc/invoke.texi (-fsanitize=[address|thread]): Document. >> >> Thanks, >> Wei.
Re: [PATCH] Change -faddress-sanitizer to -fsanitize=address
I rewrite the patch according to Jakub's suggestion -- add the following option in common.opt and keep flag_asan. The patch is attached. Ok to checkin? fsanitize=address Common Report Var(flag_asan) Enable AddressSanitizer, a memory error detector 2012-11-21 Wei Mi * common.opt: Change faddress-sanitizer to fsanitize=address. * toplev.c (process_options): Likewise. * gcc.c (LINK_COMMAND_SPEC): Likewise. * testsuite/lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Likewise. (asan_init): Likewise. * doc/invoke.texi (-fsanitize=[address|thread]): Document. Thanks, Wei. On Mon, Nov 19, 2012 at 9:09 PM, Konstantin Serebryany wrote: > On Mon, Nov 19, 2012 at 11:21 PM, Wei Mi wrote: >> I cannot remove RejectNegative and use -fno-sanitize=address, or else >> I will break an assertion (opts-common.c:614). The assertion requires >> -fxxx=var options set RejectNegative if var is of enumerater type. I >> see that all the other -fxxx=xxx options in common.opt set >> RejectNegative. >> >> Is it ok for me to checkin the current patch and submit another patch >> if there is a better way to do it? > > I would prefer to have the current patch earlier, rather then a more > comprehensive patch later. > Otherwise we may end up with too many uses of the old flag. > >> >> Thanks, >> Wei. >> >> On Mon, Nov 19, 2012 at 10:31 AM, Xinliang David Li >> wrote: >>> Questions: are -fsanitize=thread -fsanitize=address mutually exclusive >>> here? If yes, that will be wrong. >>> >>> How about -fsanitize=all option? >>> >>> As kcc mentioned, the -fno-.. form is not handled. >>> >>> David >>> >>> >>> On Mon, Nov 19, 2012 at 10:14 AM, Wei Mi wrote: >>>> Hi, >>>> >>>> This patch is to change -faddress-sanitizer to -fsanitize=address. Ok for >>>> trunk? >>>> >>>> 2012-11-19 Wei Mi >>>> >>>> * cfgexpand.c (partition_stack_vars): Change flag_asan to >>>> flag_sanitize. >>>> (expand_stack_vars): Likewise. >>>> (defer_stack_allocation): Likewise. >>>> (expand_used_vars): Likewise. >>>> * varasm.c (assemble_noswitch_variable): Likewise. >>>> (assemble_variable): Likewise. >>>> (place_block_symbol): Likewise. >>>> * asan.c (gate_asan): Likewise. >>>> (gate_asan_O0): Likewise. >>>> * toplev.c (compile_file): Likewise. >>>> (process_options): Likewise. >>>> * common.opt: Change faddress-sanitizer to fsanitize=address. >>>> * gcc.c (LINK_COMMAND_SPEC): Likewise. >>>> * testsuite/lib/asan-dg.exp >>>> (check_effective_target_faddress_sanitizer): Likewise. >>>> (asan_init): Likewise. >>>> * flag-types.h (sanitize_type): New enum type. >>>> * doc/invoke.texi (-fsanitize=[address|thread]): Document. >>>> >>>> Thanks, >>>> Wei. Index: gcc/common.opt === --- gcc/common.opt (revision 193687) +++ gcc/common.opt (working copy) @@ -840,7 +840,7 @@ fargument-noalias-anything Common Ignore Does nothing. Preserved for backward compatibility. -faddress-sanitizer +fsanitize=address Common Report Var(flag_asan) Enable AddressSanitizer, a memory error detector Index: gcc/testsuite/lib/asan-dg.exp === --- gcc/testsuite/lib/asan-dg.exp (revision 193687) +++ gcc/testsuite/lib/asan-dg.exp (working copy) @@ -14,13 +14,13 @@ # along with GCC; see the file COPYING3. If not see # <http://www.gnu.org/licenses/>. -# Return 1 if compilation with -faddress-sanitizer is error-free for trivial +# Return 1 if compilation with -fsanitize=address is error-free for trivial # code, 0 otherwise. proc check_effective_target_faddress_sanitizer {} { return [check_no_compiler_messages faddress_sanitizer object { void foo (void) { } -} "-faddress-sanitizer"] +} "-fsanitize=address"] } # @@ -83,12 +83,12 @@ proc asan_init { args } { } if [info exists ALWAYS_CXXFLAGS] { set ALWAYS_CXXFLAGS [concat "{ldflags=$link_flags}" $ALWAYS_CXXFLAGS] - set ALWAYS_CXXFLAGS [concat "{additional_flags=-faddress-sanitizer -g}" $ALWAYS_CXXFLAGS] + set ALWAYS_CXXFLAGS [concat "{additional_flags=-fsanitize=address -g}" $ALWAYS_CXXFLAGS] } else {
Re: [PATCH] Change -faddress-sanitizer to -fsanitize=address
Thanks! Done. Committed revision 193702. Regards, Wei. On Tue, Nov 20, 2012 at 11:57 PM, Jakub Jelinek wrote: > On Tue, Nov 20, 2012 at 11:14:27PM -0800, Wei Mi wrote: >> 2012-11-21 Wei Mi >> >> * common.opt: Change faddress-sanitizer to fsanitize=address. >> * toplev.c (process_options): Likewise. >> * gcc.c (LINK_COMMAND_SPEC): Likewise. >> * testsuite/lib/asan-dg.exp >> (check_effective_target_faddress_sanitizer): Likewise. >> (asan_init): Likewise. > > Ok, thanks. > >> * doc/invoke.texi (-fsanitize=[address|thread]): Document. > > Just change [address|thread] to address in the ChangeLog entry. > > Jakub
Re: [tsan] ThreadSanitizer instrumentation part
Thanks, Fixed. Wei. On Wed, Nov 21, 2012 at 11:48 PM, Dmitry Vyukov wrote: > On Thu, Nov 22, 2012 at 11:45 AM, Xinliang David Li > wrote: >> >> On Wed, Nov 21, 2012 at 11:35 PM, Dmitry Vyukov >> wrote: >> > What percent of the memory accesses the following can skip? >> > >> > I just don't know what exactly they mean. ADDR_EXPR/COMPONENT_REF look >> > like >> > it can skip a lot. >> > >> >> It does not skip a lot. >> >> > >> > + /* TODO: handle other cases >> > + (FIELD_DECL, ARRAY_RANGE_REF, TARGET_MEM_REF, ADDR_EXPR). */ >> >> The comment is not correct. The analysis should not care about >> FIELD_DECL (covered by COMPONENT_REF), nor ADDR_EXPR. Due to the pass >> ordering, target mem-ref is not seen either. > > > > > Wei, please update the comment that only ARRAY_RANGE_REF is not handled. > > >> > + if (tcode != ARRAY_REF >> > + && tcode != VAR_DECL >> > + && tcode != COMPONENT_REF >> > + && tcode != INDIRECT_REF >> > + && tcode != MEM_REF) >> > +return false; >> > >> >> The listed cases are handled. >> >> David >> >> > >> > >> > On Thu, Nov 22, 2012 at 11:29 AM, Dmitry Vyukov >> > wrote: >> >> >> >> +static bool >> >> +tsan_gate (void) >> >> +{ >> >> + return flag_tsan != 0 >> >> + && builtin_decl_implicit_p (BUILT_IN_TSAN_INIT); >> >> >> >> >> >> What does it mean? Why builtin_decl_implicit_p (BUILT_IN_TSAN_INIT)? >> >> >> >> +} >> >> >> >> >> >> >> >> On Thu, Nov 22, 2012 at 11:22 AM, Wei Mi wrote: >> >>> >> >>> Hi, >> >>> >> >>> I update the tsan patch against trunk, and create libtsan patch. >> >>> Please see if it is ok. >> >>> >> >>> gcc/ChangeLog: >> >>> 2012-11-22 Dmitry Vyukov >> >>> Wei Mi >> >>> >> >>> * builtins.def (DEF_SANITIZER_BUILTIN): Define tsan builtins. >> >>> * sanitizer.def: Ditto. >> >>> * Makefile.in (tsan.o): Add tsan.o target. >> >>> (BUILTINS_DEF): Add sanitizer.def. >> >>> * passes.c (init_optimization_passes): Add tsan passes. >> >>> * tree-pass.h (register_pass_info): Ditto. >> >>> * toplev.c (compile_file): Ditto. >> >>> * doc/invoke.texi: Document tsan related options. >> >>> * gcc.c (LINK_COMMAND_SPEC): Add LIBTSAN_SPEC in link command >> >>> if -fsanitize=thread. >> >>> * tsan.c: New file about tsan. >> >>> * tsan.h: Ditto. >> >>> * common.opt: Add -fsanitize=thread. >> >>> >> >>> libsanitizer/ChangeLog: >> >>> 2012-11-22 Wei Mi >> >>> >> >>> * tsan: New directory. Import tsan runtime from llvm. >> >>> * configure.ac: Add 64 bits tsan build. >> >>> * Makefile.am: Likewise. >> >>> * configure: Regenerated. >> >>> * Makefile.in: Likewise. >> >>> >> >>> Thanks, >> >>> Wei. >> >>> >> >>> On Sun, Nov 18, 2012 at 10:52 AM, Konstantin Serebryany >> >>> wrote: >> >>> > Just a comment about tsan. >> >>> > Today, tsan works *only* on x86_64 linux (no 32-bits, no non-linux). >> >>> > Other 64-bit platforms may be doable, but not as easy as for asan. >> >>> > Non-linux is harder than non-x86_64 (need to support tons of libc >> >>> > interceptors). >> >>> > 32-bit platforms are very hard to port to, I would not bother for >> >>> > now. >> >>> > (this probably includes x32, which has cheap atomic 64-bit >> >>> > loads/stores, but has too small address space for tsan) >> >>> > >> >>> > Conclusion: when committing tsan code, please make sure it is enable >> >>> > only on x86_64 >> >>> > >> >>> > --kcc >> >>> > >> >>> > On Sat, Nov 17, 2012 at 3:13 AM, Wei Mi wrote: >> >>> >> Hi, >> >>> >> >> >>> >> Is it ok for the trunk? >> >>> >> >> >>> >> Thanks, >> >>> >> Wei. >> >>> >> >> >>> >> On Tue, Nov 13, 2012 at 5:06 PM, Wei Mi wrote: >> >>> >>> Thanks for catching this. I update the patch. >> >>> >>> >> >>> >>> Regards, >> >>> >>> Wei. >> >>> >>> >> >>> >>> On Tue, Nov 13, 2012 at 4:54 PM, Richard Henderson >> >>> >>> >> >>> >>> wrote: >> >>> >>>> On 11/13/2012 04:08 PM, Wei Mi wrote: >> >>> >>>>> +extern void tsan_finish_file (void); >> >>> >>>>> + >> >>> >>>>> +#endif /* TREE_TSAN */ >> >>> >>>>> +/* ThreadSanitizer, a data race detector. >> >>> >>>>> + Copyright (C) 2011 Free Software Foundation, Inc. >> >>> >>>>> + Contributed by Dmitry Vyukov >> >>> >>>> >> >>> >>>> Careful, you've got double applied patches there. >> >>> >>>> >> >>> >>>> >> >>> >>>> r~ >> >> >> >> >> > > >
Re: [tsan] ThreadSanitizer instrumentation part
Thanks. I checked in the code. Committed revision 193736. Committed revision 193737. Wei. On Thu, Nov 22, 2012 at 1:54 AM, Jakub Jelinek wrote: > On Wed, Nov 21, 2012 at 11:22:51PM -0800, Wei Mi wrote: >> I update the tsan patch against trunk, and create libtsan patch. >> Please see if it is ok. >> >> gcc/ChangeLog: >> 2012-11-22 Dmitry Vyukov >> Wei Mi >> >> * builtins.def (DEF_SANITIZER_BUILTIN): Define tsan builtins. >> * sanitizer.def: Ditto. >> * Makefile.in (tsan.o): Add tsan.o target. >> (BUILTINS_DEF): Add sanitizer.def. >> * passes.c (init_optimization_passes): Add tsan passes. >> * tree-pass.h (register_pass_info): Ditto. >> * toplev.c (compile_file): Ditto. >> * doc/invoke.texi: Document tsan related options. >> * gcc.c (LINK_COMMAND_SPEC): Add LIBTSAN_SPEC in link command >> if -fsanitize=thread. >> * tsan.c: New file about tsan. >> * tsan.h: Ditto. >> * common.opt: Add -fsanitize=thread. >> >> libsanitizer/ChangeLog: >> 2012-11-22 Wei Mi >> >> * tsan: New directory. Import tsan runtime from llvm. >> * configure.ac: Add 64 bits tsan build. >> * Makefile.am: Likewise. >> * configure: Regenerated. >> * Makefile.in: Likewise. > > Ok, thanks. The comments can be fixed up incrementally. > > Jakub
Re: [tsan] ThreadSanitizer instrumentation part
Hi Jack, Koysta mentioned in a previous mail that tsan is only supported on x86_64 linux (no 32-bits, no non-linux) for now. tsan building should be disabled on the platforms other than x86-64-linux. Thanks to Jakub who will provide another patch including this fix soon. Thanks, Wei. On Thu, Nov 22, 2012 at 3:18 PM, Jack Howarth wrote: > On Thu, Nov 22, 2012 at 02:08:07PM -0800, Wei Mi wrote: >> Thanks. I checked in the code. >> Committed revision 193736. >> Committed revision 193737. >> >> Wei. > > Wei, >Unlike libasan, we seem to have issues building libtsan on darwin using > the currently proposed > patch... > > http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01817.html > > The build fails at... > > libtool: compile: > /sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/./gcc/g++ > -B/sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/./gcc/ -nostdinc++ > -nostdinc++ > -I/sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/x86_64-apple-darwin12.2.0/libstdc++-v3/include/x86_64-apple-darwin12.2.0 > > -I/sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/x86_64-apple-darwin12.2.0/libstdc++-v3/include > > -I/sw/src/fink.build/gcc48-4.8.0-1000/gcc-4.8-20121122/libstdc++-v3/libsupc++ > -I/sw/src/fink.build/gcc48-4.8.0-1000/gcc-4.8-20121122/libstdc++-v3/include/backward > > -I/sw/src/fink.build/gcc48-4.8.0-1000/gcc-4.8-20121122/libstdc++-v3/testsuite/util > > -L/sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/x86_64-apple-darwin12.2.0/libstdc++-v3/src > > -L/sw/src/fink.build/gcc48-4.8.0-1000/darwin_objdir/x86_64-apple-darwin12.2.0/libstdc++-v3/src/.libs > -B/sw/lib/gcc4.8/x86_64-apple-darwin12.2.0/bin/ > -B/sw/lib/gcc4.8/x86_64-apple-darwin12.2.0/lib/ -isystem > /sw/lib/gcc4.8/x86_64-apple-darwin12.2.0/include -isystem > /sw/lib/gcc4.8/x86_64-apple-darwin12.2.0/sys-include -D_GNU_SOURCE -D_DEBUG > -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I. > -I../../../../gcc-4.8-20121122/libsanitizer/tsan -I > ../../../../gcc-4.8-20121122/libsanitizer -I > ../../../../gcc-4.8-20121122/libsanitizer/include -Wall -W > -Wno-unused-parameter -Wwrite-strings -pedantic -Wno-long-long -fPIC > -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables > -fvisibility=hidden -Wno-variadic-macros -Wno-c99-extensions -g -O2 -MT > tsan_interceptors.lo -MD -MP -MF .deps/tsan_interceptors.Tpo -c > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc > -fno-common -DPIC -o .libs/tsan_interceptors.o > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc: In > function ‘unsigned int wrap_sleep(unsigned int)’: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:164:27: > error: ‘sleep’ was not declared in this scope >unsigned res = sleep(sec); >^ > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc: In > function ‘int wrap_usleep(long_t)’: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:171:24: > error: ‘usleep’ was not declared in this scope >int res = usleep(usec); > ^ > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc: In > function ‘int wrap_nanosleep(void*, void*)’: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:178:31: > error: ‘nanosleep’ was not declared in this scope >int res = nanosleep(req, rem); >^ > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc: In > function ‘void (* wrap_signal(int, sighandler_t))(int)’: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:1270:38: > error: ‘sigaction’ was not declared in this scope >int res = sigaction(sig, &act, &old); > ^ > In file included from > ../../../../gcc-4.8-20121122/libsanitizer/interception/interception.h:184:0, > from > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.h:15, > from > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:16: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc: In > function ‘void __tsan::InitializeInterceptors()’: > ../../../../gcc-4.8-20121122/libsanitizer/tsan/tsan_interceptors.cc:1377:18: > error: ‘longjmp’ was not declared in this scope >TSAN_INTERCEPT(longjmp); > ^ > ../../../../gcc-4.8-20121122/libsanitizer/interception/interception_mac.h:41:35: > note: in definition of macro ‘OVERRIDE_FUNCTION_MAC’ >(::__interception::uptr)old_func, \ >^ > ../../../../gcc-4.8-20121122/libsanitizer/interception/interception.h:187:35: > note: in
Re: [PATCH] asan unit tests from llvm lit-test
Hi, Jakub, thank you for your so detailed comments! I will fix them according to your comments. About the lto options, llvm test does't include it too so I skipped it in torture options. Is it because most cases we only use asan under O1/O2? Kostya, could you tell us is there any reason to not test lto+asan in llvm side? Thanks, Wei. On Mon, Dec 3, 2012 at 3:00 AM, Jakub Jelinek wrote: > Hi! > > Mike, CCing you especially on the proposed lib/gcc-dg.exp dg-env-var > changes and I have one question about cleanup of files (file delete > vs. remote_file target (or is that host or build) delete). > But of course if you could eyeball the rest and comment, I'd be even happier. > > On Fri, Nov 30, 2012 at 12:35:35PM -0800, Wei Mi wrote: >> Thanks for the comments! Here is the second version patch. Please see >> if it is ok. >> (-Wno-attributes is kept or else we will get a warning because of >> __attribute__((always_inline))). > >> --- gcc/testsuite/gcc.dg/asan/asan.exp(revision 194002) >> +++ gcc/testsuite/gcc.dg/asan/asan.exp(working copy) >> @@ -30,6 +30,10 @@ if ![check_effective_target_faddress_san >> dg-init >> asan_init >> >> +# Set default torture options >> +set default_asan_torture_options [list { -O0 } { -O1 } { -O2 } { -O3 }] >> +set-torture-options $default_asan_torture_options > > Why this? What is undesirable on the default torture options? > Do those tests fail with lto or similar? > tests on llvm side don't contain lto option so I do the same. Some tests fail with lto because more aggressive inline. >> Index: gcc/testsuite/c-c++-common/asan/strip-path-prefix-1.c >> === >> --- gcc/testsuite/c-c++-common/asan/strip-path-prefix-1.c (revision 0) >> +++ gcc/testsuite/c-c++-common/asan/strip-path-prefix-1.c (revision 0) >> @@ -0,0 +1,14 @@ >> +/* { dg-do run } */ >> +/* { dg-skip-if "" { *-*-* } { "*" } { "-O2 -m64" } } */ > > The -m64 here is just wrong. If you want to run the test only > for -O2 and x86_64-linux compilation (why?, what is so specific > about it to that combination?), then you'd do > /* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */ > /* { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */ > or so. But again, why? > I copied it from llvm test. I think it just think -m64 test is enough to check the feature. >> --- gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c (revision 0) >> +++ gcc/testsuite/c-c++-common/asan/force-inline-opt0-1.c (revision 0) >> @@ -0,0 +1,16 @@ >> +/* This test checks that we are no instrumenting a memory access twice >> + (before and after inlining) */ >> + >> +/* { dg-do run } */ >> +/* { dg-options "-Wno-attributes" } */ >> +/* { dg-skip-if "" { *-*-* } { "*" } { "-O0 -m64" "-O1 -m64" } } */ > > As I said above. Why is this not tested for 32-bit testing? > From the name, -O0/-O1 limit could make sense, but then even for -O2 and > above it should do the same. > I also copied it from llvm. >> --- gcc/testsuite/g++.dg/asan/deep-stack-uaf-1.C (revision 0) >> +++ gcc/testsuite/g++.dg/asan/deep-stack-uaf-1.C (revision 0) >> @@ -0,0 +1,33 @@ >> +// Check that we can store lots of stack frames if asked to. >> + >> +// { dg-do run } >> +// { dg-env-var ASAN_OPTIONS "malloc_context_size=120:redzone=512" } >> +// { dg-shouldfail "asan" } > > Can you please replace the two spaces after // with just one? > Dejagnu directives are often quite long, and thus it is IMHO better to make > the lines longer than necessary. > For this test, don't you need > // { dg-options "-fno-optimize-sibling-calls" } > and __attribute__((noinline)) on the free method? Otherwise I'd expect > that either at least at -O3 it could be all inlined, or if not inlined, then > at least tail call optimized (and thus not showing up in the backtrace > either). > Ok, will fix it. >> +#include >> +#include >> + >> +template >> +struct DeepFree { >> + static void free(char *x) { >> +DeepFree::free(x); >> + } >> +}; >> + >> +template<> >> +struct DeepFree<0> { >> + static void free(char *x) { >> +::free(x); >> + } >> +}; >> + >> +int main() { >> + char *x = new char[10]; >> + // deep_free(x); >> + DeepFree<200>::free(x); >> + return x[5]; >> +} >> + >> +// { dg-output "