Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)

Uros Bizjak Wed, 04 Dec 2013 02:49:06 -0800

On Wed, Dec 4, 2013 at 11:15 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote:
>> When "trap on unaligned" is activated, some program should not trap in
>> different way due to compiler transformations. The optimized and
>> unoptimized code should run (and trap) in the same way.
>
> Ok.
>
> Note, seems I have missed a i386/sse-1.c test regression (apparently because
> earlier version of the patch failed that test and I have compared
> testresults against that instead of earlier regtest), and from extra testing
> that recorded all insns which were rejected by the new return false
> in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed.
> So attached updated patch fixes those two.  I've missed ssememalign
> on sse_loadlpd insn where all the alternatives don't #UD on misaligned,
> and previously also skipped sse2_load[lh]pd because I was afraid about the
> splitters, but the splitter creates a DFmode store, which is always fine
> misaligned.
>
> Incremental diff is:
> --- gcc/config/i386/i386.c      2013-12-02 19:57:39.116438744 +0100
> +++ gcc/config/i386/i386.c      2013-12-04 10:39:43.614269591 +0100
> @@ -32563,6 +32543,15 @@
>        nargs = 1;
>        klass = load;
>        memory = 0;
> +      switch (icode)
> +       {
> +       case CODE_FOR_sse4_1_movntdqa:
> +       case CODE_FOR_avx2_movntdqa:
> +         aligned_mem = true;
> +         break;
> +       default:
> +         break;
> +       }
>        break;
>      case VOID_FTYPE_PV2SF_V4SF:
>      case VOID_FTYPE_PV4DI_V4DI:
> --- gcc/config/i386/sse.md      2013-12-02 18:02:00.325523050 +0100
> +++ gcc/config/i386/sse.md      2013-12-04 11:08:48.589128840 +0100
> @@ -5278,6 +5278,7 @@
>     %vmovlps\t{%2, %0|%q0, %2}"
>    [(set_attr "isa" "noavx,avx,noavx,avx,*")
>     (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov")
> +   (set_attr "ssememalign" "64")
>     (set_attr "length_immediate" "1,1,*,*,*")
>     (set_attr "prefix" "orig,vex,orig,vex,maybe_vex")
>     (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")])
> @@ -7066,6 +7067,7 @@
>     #"
>    [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*")
>     (set_attr "type" "ssemov,ssemov,sselog,sselog,ssemov,fmov,imov")
> +   (set_attr "ssememalign" "64")
>     (set_attr "prefix_data16" "1,*,*,*,*,*,*")
>     (set_attr "prefix" "orig,vex,orig,vex,*,*,*")
>     (set_attr "mode" "V1DF,V1DF,V2DF,V2DF,DF,DF,DF")])
> @@ -7134,6 +7136,7 @@
>               (const_string "imov")
>            ]
>            (const_string "ssemov")))
> +   (set_attr "ssememalign" "64")
>     (set_attr "prefix_data16" "*,1,*,*,*,*,1,*,*,*,*")
>     (set_attr "length_immediate" "*,*,*,*,*,1,*,*,*,*,*")
>     (set_attr "prefix" "maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*")
>
>> However, I don't think ix86_expand_special_args_builtin is needed. It
>> is the duty of the programmer to pass properly aligned address to
>> these builtins. The compiler shouldn't magically "fix" invalid code,
>> in the same way as it shouldn't "break" valid code as in the case
>> above.
>
> What I'm trying to do in ix86_expand_special_args_builtin is not in any way
> fixing invalid code.  The problem is that those builtins don't have a memory
> as an argument, they have pointer, and the MEM is compiler created from
> scratch.  As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT
> alignment on those MEMs, so the problem is that the combiner hook will
> reject any changes whatsoever on such instructions.  get_pointer_alignment
> (added in the patch) adds there alignment info from what the compiler can
> find out (e.g.  if the pointer points to a decl with certain alignment
> etc.), but if it is e.g.  an argument of a function, it will still return
> likely BITS_PER_UNIT even when on valid code it will always be properly
> aligned (you'd need to use __builtin_assume_aligned or similar to force the
> alignment there).  I went through the special builtins for which gen_rtx_MEM
> is called and they apparently fall into just two categories, one is various
> builtins for unaligned vector loads or for < 16 byte load/store instructions
> (type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the
> other are non-temporal loads/stores, which both pre-AVX and AVX+ require
> strict alignment.  Only for the latter category I'm forcing the mode
> alignment, because even if the compiler can't prove correct alignment based
> on ccp etc., valid program must have such MEMs properly aligned and if we
> create MEMs with smaller alignment, the combine hook will just punt on them
> always.


Thanks for the explanation! Maybe you should add a short comment like
the above in the code.

> So, ok for trunk this way?  I guess 4.8 patch should wait some time before
> being backported.
>
> 2013-12-04  Jakub Jelinek  <ja...@redhat.com>
>             Uros Bizjak  <ubiz...@gmail.com>
>
>         PR target/59163
>         * config/i386/i386.c (ix86_legitimate_combined_insn): If for
>         !TARGET_AVX there is misaligned MEM operand with vector mode
>         and get_attr_ssememalign is 0, return false.
>         (ix86_expand_special_args_builtin): Add get_pointer_alignment
>         computed alignment and for non-temporal loads/stores also
>         at least GET_MODE_ALIGNMENT as MEM_ALIGN.
>         * config/i386/sse.md
>         (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>,
>         <sse>_storeu<ssemodesuffix><avxsizesuffix>,
>         <sse2_avx_avx512f>_loaddqu<mode><mask_name>,
>         <sse2_avx_avx512f>_storedqu<mode>, <sse3>_lddqu<avxsizesuffix>,
>         sse_vmrcpv4sf2, sse_vmrsqrtv4sf2, sse2_cvtdq2pd, sse_movhlps,
>         sse_movlhps, sse_storehps, sse_loadhps, sse_loadlps,
>         *vec_interleave_highv2df, *vec_interleave_lowv2df,
>         *vec_extractv2df_1_sse, sse2_movsd, sse4_1_<code>v8qiv8hi2,
>         sse4_1_<code>v4qiv4si2, sse4_1_<code>v4hiv4si2,
>         sse4_1_<code>v2qiv2di2, sse4_1_<code>v2hiv2di2,
>         sse4_1_<code>v2siv2di2, sse4_2_pcmpestr, *sse4_2_pcmpestr_unaligned,
>         sse4_2_pcmpestri, sse4_2_pcmpestrm, sse4_2_pcmpestr_cconly,
>         sse4_2_pcmpistr, *sse4_2_pcmpistr_unaligned, sse4_2_pcmpistri,
>         sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Add ssememalign attribute.
>         * config/i386/i386.md (ssememalign): New define_attr.
>
>         * g++.dg/torture/pr59163.C: New test.

This is OK for 4.8+

Thanks,
Uros.

Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)

Reply via email to