On Wed, 12 Mar 2014, Cong Hou wrote: > Thank you for pointing it out. I didn't realized that alias analysis > has influences on this issue. > > The current problem is that the epilogue may be unnecessary if the > loop bound cannot be larger than the number of iterations of the > vectorized loop multiplied by VF when the vectorized loop is supposed > to be executed. My method is incorrect because I assume the vectorized > loop will be executed which is actually guaranteed by loop bound check > (and also alias checks). So if the alias checks exist, my method is > fine as both conditions are met.
But there is still the loop bound check which, if it fails, uses the epilogue loop as fallback, not the scalar versioned loop. > If there is no alias checks, I must > consider the possibility that the vectorized loop may not be executed > at runtime and then the epilogue should not be eliminated. The warning > appears on epilogue, and with loop bound checks (and without alias > checks) the warning will be gone. So I think the key is alias checks: > my method only works if there is no alias checks. > > How about adding one more condition that checks if alias checks are > needed, as the code shown below? > > else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo)) > < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > && (!LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo) > || (unsigned HOST_WIDE_INT)max_stmt_executions_int > (LOOP_VINFO_LOOP (loop_vinfo)) > (unsigned)th))) > LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true; > > > thanks, > Cong > > > On Wed, Mar 12, 2014 at 1:24 AM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Mar 11, 2014 at 04:16:13PM -0700, Cong Hou wrote: > >> This patch is fixing PR60505 in which the vectorizer may produce > >> unnecessary epilogues. > >> > >> Bootstrapped and tested on a x86_64 machine. > >> > >> OK for trunk? > > > > That looks wrong. Consider the case where the loop isn't versioned, > > if you disable generation of the epilogue loop, you end up only with > > a vector loop. > > > > Say: > > unsigned char ovec[16] __attribute__((aligned (16))) = { 0 }; > > void > > foo (char *__restrict in, char *__restrict out, int num) > > { > > int i; > > > > in = __builtin_assume_aligned (in, 16); > > out = __builtin_assume_aligned (out, 16); > > for (i = 0; i < num; ++i) > > out[i] = (ovec[i] = in[i]); > > out[num] = ovec[num / 2]; > > } > > -O2 -ftree-vectorize. Now, consider if this function is called > > with num != 16 (num > 16 is of course invalid, but num 0 to 15 is > > valid and your patch will cause a wrong-code in this case). > > > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer