Re: [RFC PATCH] For TARGET_AVX use *mov_internal for misaligned loads

Jakub Jelinek Wed, 30 Oct 2013 03:00:54 -0700

On Wed, Oct 30, 2013 at 10:53:58AM +0100, Ondřej Bílka wrote:
> > Yesterday I've noticed that for AVX which allows unaligned operands in
> > AVX arithmetics instructions we still don't combine unaligned loads with the
> > AVX arithmetics instructions.  So say for -O2 -mavx -ftree-vectorize
> > void
> > f1 (int *__restrict e, int *__restrict f)
> > {
> >   int i;
> >   for (i = 0; i < 1024; i++)
> >     e[i] = f[i] * 7;
> > }
> > 
> > void
> > f2 (int *__restrict e, int *__restrict f)
> > {
> >   int i;
> >   for (i = 0; i < 1024; i++)
> >     e[i] = f[i];
> > }
> > we have:
> >         vmovdqu (%rsi,%rax), %xmm0
> >         vpmulld %xmm1, %xmm0, %xmm0
> >         vmovups %xmm0, (%rdi,%rax)
> > in the first loop.  Apparently all the MODE_VECTOR_INT and MODE_VECTOR_FLOAT
> > *mov<mode>_internal patterns (and various others) use misaligned_operand
> > to see if they should emit vmovaps or vmovups (etc.), so as suggested by
> 
> That is intentional. In pre-haswell architectures splitting load is
> faster than 32 byte load.


But the above is 16 byte unaligned load.  Furthermore, GCC supports
-mavx256-split-unaligned-load and can emit 32 byte loads either as an
unaligned 32 byte load, or merge of 16 byte unaligned loads.  The patch
affects only the cases where we were already emitting 16 byte or 32 byte
unaligned loads rather than split loads.

        Jakub

Re: [RFC PATCH] For TARGET_AVX use *mov_internal for misaligned loads

Reply via email to