Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

Richard Guenther Fri, 28 Oct 2011 04:35:38 -0700

On Fri, 28 Oct 2011, Richard Guenther wrote:

> On Fri, 28 Oct 2011, Jakub Jelinek wrote:
> 
> > On Fri, Oct 28, 2011 at 12:59:48PM +0200, Richard Guenther wrote:
> > > It is also because of re-use of memory via memcpy (yes, some dubious
> > > TBAA case from C, but essentially we don't want to break that).  Thus
> > > we can't use TBAA on anonymous memory.
> > 
> > No, IMHO we always use a ref_all mem access in that case.
> > If you meant something like:
> > 
> > void
> > foo (int *intptr, float *floatptr)
> > {
> >   int i;
> >   for (i = 0; i < 256; ++i)
> >     {
> >       int tem;
> >       __builtin_memcpy (&tem, &intptr[i], sizeof (tem));
> >       floatptr[i] = (float) tem;
> >     }
> > }
> > 
> > which is valid C even if intptr == floatptr, we have:
> > 
> > <bb 2>:
> > 
> > <bb 3>:
> >   # i_21 = PHI <i_14(4), 0(2)>
> >   # ivtmp.12_27 = PHI <ivtmp.12_26(4), 256(2)>
> >   D.2709_3 = (long unsigned int) i_21;
> >   D.2710_4 = D.2709_3 * 4;
> >   D.2711_6 = intptr_5(D) + D.2710_4;
> >   D.2712_7 = MEM[(char * {ref-all})D.2711_6];
> >   D.2713_11 = floatptr_10(D) + D.2710_4;
> >   D.2715_13 = (float) D.2712_7;
> >   *D.2713_11 = D.2715_13;
> >   i_14 = i_21 + 1;
> >   ivtmp.12_26 = ivtmp.12_27 - 1;
> >   if (ivtmp.12_26 != 0)
> >     goto <bb 4>;
> >   else
> >     goto <bb 5>;
> > 
> > <bb 4>:
> >   goto <bb 3>;
> > 
> > which is just fine even with TBAA.
> > And similarly for
> > void
> > bar (int *intptr, float *floatptr)
> > {
> >   int i;
> >   for (i = 0; i < 256; ++i)
> >     {
> >       float tem;
> >       tem = (float) intptr[i];
> >       __builtin_memcpy (&floatptr[i], &tem, sizeof (tem));
> >     }
> > }
> > 
> > where the ref-all isn't used for load, but for store.
> 
> Well, yeah.  I said it's probably difficult to generate a
> C testcase.  It's still valid middle-end IL (and well-defined) to have
> intptr == floatptr and  MEM[(int *)..] and MEM[(float *)...].


Btw, only the exact overlap case is critical, for non-exact overlap
like

  for (i)
   {
    float[i] = int[i-1] + int[i];
   }

you can reason that there cannot be aliasing as if you execute this
loop more than once(!) then you'd have

  float[i] = int[i-1] + int[i];
  float[i+1] = int[i] + int[i+1];
...

where the 2nd load from int[i] would load from float-initialized
memory which is undefined.  Thus you can assume that float != int.
But that requires more thorough analysis that we don't do at the
moment and knowledge that the loop will iterate at least N
times (when called from the vectorizer, the vectorization factor,
which is at least 2).

Richard.

Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)

Reply via email to