On Mon, Jun 13, 2016 at 12:06 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > Take subroutine "DACOP" from spec2k/200.sixtrack as an example, the loop > needs to be versioned for vectorization because of possibly alias. The alias > check for data-references are like: > > //pair 1 > dr_a: > (Data Ref: > bb: 8 > stmt: _92 = da.cc[_27]; > ref: da.cc[_27]; > ) > dr_b: > (Data Ref: > bb: 8 > stmt: da.cc[_93] = _92; > ref: da.cc[_93]; > ) > //pair 2 > dr_a: > (Data Ref: > bb: 8 > stmt: pretmp_29 = da.i2[_27]; > ref: da.i2[_27]; > ) > dr_b: > (Data Ref: > bb: 8 > stmt: da.i2[_93] = pretmp_29; > ref: da.i2[_93]; > ) > //pair 3 > dr_a: > (Data Ref: > bb: 8 > stmt: pretmp_28 = da.i1[_27]; > ref: da.i1[_27]; > ) > dr_b: > (Data Ref: > bb: 8 > stmt: da.i1[_93] = pretmp_28; > ref: da.i1[_93]; > ) > > The code generated for alias checks are as below: > > <bb 23>: > # iy_186 = PHI <_413(22), 2(2)> > # ivtmp_1050 = PHI <ivtmp_1049(22), 512(2)> > _155 = iy_186 + -2; > _156 = _155 * 516; > _241 = iy_186 + -1; > _242 = _241 * 516; > _328 = iy_186 * 516; > _413 = iy_186 + 1; > _414 = _413 * 516; > _499 = iy_186 + 2; > _500 = _499 * 516; > _998 = iy_186 * 516; > _997 = (sizetype) _998; > _996 = _997 + 6; > _995 = _996 * 4; > _994 = global_Output.2_16 + _995; > _993 = iy_186 * 516; > _992 = (long unsigned int) _993; > _991 = _992 * 4; > _990 = _991 + 18446744073709547488; > _989 = global_Input.0_153 + _990; > _886 = _989 >= _994; > _885 = iy_186 * 516; > _884 = (sizetype) _885; > _883 = _884 + 1040; > _882 = _883 * 4; > _881 = global_Input.0_153 + _882; > _880 = (sizetype) _998; > _879 = _880 + 2; > _878 = _879 * 4; > _877 = global_Output.2_16 + _878; > _876 = _877 >= _881; > _875 = _876 | _886; > _874 = iy_186 * 516; > _873 = (sizetype) _874; > _872 = _873 + 514; > _871 = _872 * 4; > _870 = global_Output.2_16 + _871; > _869 = local_Filter_33 >= _870; > _868 = local_Filter_33 + 100; > _867 = (sizetype) _874; > _866 = _867 + 2; > _865 = _866 * 4; > _864 = global_Output.2_16 + _865; > _863 = _864 >= _868; > _862 = _863 | _869; > _861 = _862 & _875; > if (_861 != 0) > goto <bb 7>; > else > goto <bb 4>; > > It contains quite a lot redundant computations. Root cause is vectorizer > simply translates alias checks into full address expressions comparison, and > CSE opportunities are covered by foler. This patch improves function > vect_create_cond_for_alias_checks by simplifying the comparison by comparing > DR_BASE_ADDRESS/DR_INIT of both data-reference at compilation time. It also > simplifies conditions: > (addr_a_min + addr_a_length) <= addr_b_min || (addr_b_min + addr_b_length) > <= addr_a_min > into below form: > cond_expr = addr_b_min - addr_a_min > cond_expr >= addr_a_length || cond_expr <= -addr_b_length > if the comparison is done in signed type. And this can be further simplified > by folder if addr_a_length and addr_b_lengnth are equal/const, which is quite > common. > I looked into generated assembly, this patch does introduces small regression > in some cases, but overall I think it's good. Bootstrap and test on x86_64 > and AArch64. Is it OK?
Hmm. I think it's obviously good to make the generated expressions smaller. Ideally we'd know the dependence distance here (but have added them to the runtime checks because the distance was symbolic - dependence analysis doesn't give us this information). This means we often can't optimize the check to i0 - i1 >= 4 but retain useless widening/multiplications. Does your patch help with those? int cmp = tree_int_cst_compare (init_a, init_b); + /* No need to consider init parts if they are equal in both DRs. */ + if (wi::to_widest (init_a) != wi::to_widest (init_b)) cmp != 0 + { + if (wi::to_widest (init_a) < wi::to_widest (init_b)) cmp < 0 + { .... this code seems to mimic some comparison folding we have? Does that not trigger for some reason? That said, not adding the address-base will make that not trigger if the remainder of ops are unsigned. Doing intermittent force_gimple_operand will make fold not see the whole trees and thus simplifications get lost. In fact you're mixing gimple stmt building and generic expr building - ideally you'd use gimple_build (...) for all of it (but that doesn't work as all the DR_ components are GENERIC). So I think we have to be careful with this kind of micro-optimization patch. If you'd extracted the "indices" to compare somehow that would be great. Another option would be to use tree-affine to simplify the compares. Richard. > Thanks, > bin > > 2016-06-08 Bin Cheng <bin.ch...@arm.com> > > * tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): New > Parameter. Simplify alias check conditions at compilation time. > (vect_loop_versioning): Pass new argument to above function.