On Wed, Dec 4, 2013 at 11:15 AM, Jakub Jelinek <ja...@redhat.com> wrote: > On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote: >> When "trap on unaligned" is activated, some program should not trap in >> different way due to compiler transformations. The optimized and >> unoptimized code should run (and trap) in the same way. > > Ok. > > Note, seems I have missed a i386/sse-1.c test regression (apparently because > earlier version of the patch failed that test and I have compared > testresults against that instead of earlier regtest), and from extra testing > that recorded all insns which were rejected by the new return false > in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed. > So attached updated patch fixes those two. I've missed ssememalign > on sse_loadlpd insn where all the alternatives don't #UD on misaligned, > and previously also skipped sse2_load[lh]pd because I was afraid about the > splitters, but the splitter creates a DFmode store, which is always fine > misaligned. > > Incremental diff is: > --- gcc/config/i386/i386.c 2013-12-02 19:57:39.116438744 +0100 > +++ gcc/config/i386/i386.c 2013-12-04 10:39:43.614269591 +0100 > @@ -32563,6 +32543,15 @@ > nargs = 1; > klass = load; > memory = 0; > + switch (icode) > + { > + case CODE_FOR_sse4_1_movntdqa: > + case CODE_FOR_avx2_movntdqa: > + aligned_mem = true; > + break; > + default: > + break; > + } > break; > case VOID_FTYPE_PV2SF_V4SF: > case VOID_FTYPE_PV4DI_V4DI: > --- gcc/config/i386/sse.md 2013-12-02 18:02:00.325523050 +0100 > +++ gcc/config/i386/sse.md 2013-12-04 11:08:48.589128840 +0100 > @@ -5278,6 +5278,7 @@ > %vmovlps\t{%2, %0|%q0, %2}" > [(set_attr "isa" "noavx,avx,noavx,avx,*") > (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov") > + (set_attr "ssememalign" "64") > (set_attr "length_immediate" "1,1,*,*,*") > (set_attr "prefix" "orig,vex,orig,vex,maybe_vex") > (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) > @@ -7066,6 +7067,7 @@ > #" > [(set_attr "isa" "noavx,avx,noavx,avx,*,*,*") > (set_attr "type" "ssemov,ssemov,sselog,sselog,ssemov,fmov,imov") > + (set_attr "ssememalign" "64") > (set_attr "prefix_data16" "1,*,*,*,*,*,*") > (set_attr "prefix" "orig,vex,orig,vex,*,*,*") > (set_attr "mode" "V1DF,V1DF,V2DF,V2DF,DF,DF,DF")]) > @@ -7134,6 +7136,7 @@ > (const_string "imov") > ] > (const_string "ssemov"))) > + (set_attr "ssememalign" "64") > (set_attr "prefix_data16" "*,1,*,*,*,*,1,*,*,*,*") > (set_attr "length_immediate" "*,*,*,*,*,1,*,*,*,*,*") > (set_attr "prefix" "maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*") > >> However, I don't think ix86_expand_special_args_builtin is needed. It >> is the duty of the programmer to pass properly aligned address to >> these builtins. The compiler shouldn't magically "fix" invalid code, >> in the same way as it shouldn't "break" valid code as in the case >> above. > > What I'm trying to do in ix86_expand_special_args_builtin is not in any way > fixing invalid code. The problem is that those builtins don't have a memory > as an argument, they have pointer, and the MEM is compiler created from > scratch. As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT > alignment on those MEMs, so the problem is that the combiner hook will > reject any changes whatsoever on such instructions. get_pointer_alignment > (added in the patch) adds there alignment info from what the compiler can > find out (e.g. if the pointer points to a decl with certain alignment > etc.), but if it is e.g. an argument of a function, it will still return > likely BITS_PER_UNIT even when on valid code it will always be properly > aligned (you'd need to use __builtin_assume_aligned or similar to force the > alignment there). I went through the special builtins for which gen_rtx_MEM > is called and they apparently fall into just two categories, one is various > builtins for unaligned vector loads or for < 16 byte load/store instructions > (type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the > other are non-temporal loads/stores, which both pre-AVX and AVX+ require > strict alignment. Only for the latter category I'm forcing the mode > alignment, because even if the compiler can't prove correct alignment based > on ccp etc., valid program must have such MEMs properly aligned and if we > create MEMs with smaller alignment, the combine hook will just punt on them > always.
Thanks for the explanation! Maybe you should add a short comment like the above in the code. > So, ok for trunk this way? I guess 4.8 patch should wait some time before > being backported. > > 2013-12-04 Jakub Jelinek <ja...@redhat.com> > Uros Bizjak <ubiz...@gmail.com> > > PR target/59163 > * config/i386/i386.c (ix86_legitimate_combined_insn): If for > !TARGET_AVX there is misaligned MEM operand with vector mode > and get_attr_ssememalign is 0, return false. > (ix86_expand_special_args_builtin): Add get_pointer_alignment > computed alignment and for non-temporal loads/stores also > at least GET_MODE_ALIGNMENT as MEM_ALIGN. > * config/i386/sse.md > (<sse>_loadu<ssemodesuffix><avxsizesuffix><mask_name>, > <sse>_storeu<ssemodesuffix><avxsizesuffix>, > <sse2_avx_avx512f>_loaddqu<mode><mask_name>, > <sse2_avx_avx512f>_storedqu<mode>, <sse3>_lddqu<avxsizesuffix>, > sse_vmrcpv4sf2, sse_vmrsqrtv4sf2, sse2_cvtdq2pd, sse_movhlps, > sse_movlhps, sse_storehps, sse_loadhps, sse_loadlps, > *vec_interleave_highv2df, *vec_interleave_lowv2df, > *vec_extractv2df_1_sse, sse2_movsd, sse4_1_<code>v8qiv8hi2, > sse4_1_<code>v4qiv4si2, sse4_1_<code>v4hiv4si2, > sse4_1_<code>v2qiv2di2, sse4_1_<code>v2hiv2di2, > sse4_1_<code>v2siv2di2, sse4_2_pcmpestr, *sse4_2_pcmpestr_unaligned, > sse4_2_pcmpestri, sse4_2_pcmpestrm, sse4_2_pcmpestr_cconly, > sse4_2_pcmpistr, *sse4_2_pcmpistr_unaligned, sse4_2_pcmpistri, > sse4_2_pcmpistrm, sse4_2_pcmpistr_cconly): Add ssememalign attribute. > * config/i386/i386.md (ssememalign): New define_attr. > > * g++.dg/torture/pr59163.C: New test. This is OK for 4.8+ Thanks, Uros.