On Tue, Feb 4, 2014 at 5:27 PM, Bingfeng Mei <b...@broadcom.com> wrote: > Hi, > One of biggest issues we have with GCC vectorization is bloated code size. > For example, vectorized version is 2.5 times of non-vectorized one for the > following simple code. One reason is that GCC often creates one loop copy > because of aliasing/alignment and one epilog loop because of loop iteration > constraint.
One thing to improve is to reduce the cases where we apply peeling for alignment - by more properly modelling the cost effect for example (also by considering that when you align 'a' then you might spuriously misalign 'b'). Another idea is (if the target supports misaligned accesses) to do both prologue and epilogue in vector code by doing redundant work (overlap with the first / last vector iterations) and thus avoid creating a loop for the prologue / epilogue. Of course that has constraints on the kind of operations that are supported (likely more difficult if reductions / inductions are involved or if there are dependences to be honored). Richard. > void foo (int *a, int *b, int N) > { > int i; > for (i = 0; i < N; i++) > { > a[i] = b[i]; > } > } > > Looking closely, the epilog loop and alignement/aliasing loop are almost > identical, just different in initial values for some variables entering > the loop. Can they be merged into one in such situations? If yes, any > suggestion on how to implement it? > > ... > <bb 7>: > # i_39 = PHI <i_47(8), i_50(10)> > _41 = (long unsigned int) i_39; > _42 = _41 * 4; > _43 = a_7(D) + _42; > _44 = b_9(D) + _42; > _45 = *_44; > *_43 = _45; > i_47 = i_39 + 1; > if (N_4(D) > i_47) > goto <bb 8>; > else > goto <bb 15>; > > <bb 8>: > goto <bb 7>; > > <bb 9>: > # i_51 = PHI <i_13(6)> > tmp.6_56 = (int) ratio_mult_vf.5_38; > if (niters.3_34 == ratio_mult_vf.5_38) > goto <bb 16>; > else > goto <bb 10>; > > <bb 10>: > # i_50 = PHI <tmp.6_56(9), 0(4)> > goto <bb 7>; > > <bb 11>: > goto <bb 6>; > > <bb 12>: > > <bb 13>: > # i_24 = PHI <0(12), i_32(14)> > _26 = (long unsigned int) i_24; > _27 = _26 * 4; > _28 = a_7(D) + _27; > _29 = b_9(D) + _27; > _30 = *_29; > *_28 = _30; > i_32 = i_24 + 1; > if (N_4(D) > i_32) > goto <bb 14>; > else > goto <bb 17>; > ... > > Thanks, > Bingfeng