See also http://gcc.gnu.org/ml/gcc/2013-08/msg00259.html
There are some concerns, but it would be interesting to do some benchmarking of this. David On Tue, Feb 4, 2014 at 8:27 AM, Bingfeng Mei <b...@broadcom.com> wrote: > Hi, > One of biggest issues we have with GCC vectorization is bloated code size. > For example, vectorized version is 2.5 times of non-vectorized one for the > following simple code. One reason is that GCC often creates one loop copy > because of aliasing/alignment and one epilog loop because of loop iteration > constraint. > > void foo (int *a, int *b, int N) > { > int i; > for (i = 0; i < N; i++) > { > a[i] = b[i]; > } > } > > Looking closely, the epilog loop and alignement/aliasing loop are almost > identical, just different in initial values for some variables entering > the loop. Can they be merged into one in such situations? If yes, any > suggestion on how to implement it? > > ... > <bb 7>: > # i_39 = PHI <i_47(8), i_50(10)> > _41 = (long unsigned int) i_39; > _42 = _41 * 4; > _43 = a_7(D) + _42; > _44 = b_9(D) + _42; > _45 = *_44; > *_43 = _45; > i_47 = i_39 + 1; > if (N_4(D) > i_47) > goto <bb 8>; > else > goto <bb 15>; > > <bb 8>: > goto <bb 7>; > > <bb 9>: > # i_51 = PHI <i_13(6)> > tmp.6_56 = (int) ratio_mult_vf.5_38; > if (niters.3_34 == ratio_mult_vf.5_38) > goto <bb 16>; > else > goto <bb 10>; > > <bb 10>: > # i_50 = PHI <tmp.6_56(9), 0(4)> > goto <bb 7>; > > <bb 11>: > goto <bb 6>; > > <bb 12>: > > <bb 13>: > # i_24 = PHI <0(12), i_32(14)> > _26 = (long unsigned int) i_24; > _27 = _26 * 4; > _28 = a_7(D) + _27; > _29 = b_9(D) + _27; > _30 = *_29; > *_28 = _30; > i_32 = i_24 + 1; > if (N_4(D) > i_32) > goto <bb 14>; > else > goto <bb 17>; > ... > > Thanks, > Bingfeng