On Mon, Jan 4, 2016 at 4:11 AM, H.J. Lu <hjl.to...@gmail.com> wrote: > On Sat, Jan 2, 2016 at 10:26 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Sat, Jan 2, 2016 at 3:58 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On January 2, 2016 11:32:33 AM GMT+01:00, Uros Bizjak <ubiz...@gmail.com> >>> wrote: >>>>On Thu, Dec 31, 2015 at 4:29 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>>> On Thu, Dec 31, 2015 at 1:14 AM, Uros Bizjak <ubiz...@gmail.com> >>>>wrote: >>>>>> On Wed, Dec 30, 2015 at 9:53 PM, H.J. Lu <hjl.to...@gmail.com> >>>>wrote: >>>>>>> SSE vector arithmetic and logic instructions only accept aligned >>>>memory >>>>>>> operand. This patch adds vector_memory_operand and "Bm" constraint >>>>for >>>>>>> aligned SSE memory operand. They are applied to SSE any_logic >>>>patterns. >>>>>>> >>>>>>> OK for trunk and release branches if there are regressions? >>>>>> >>>>>> This patch is just papering over deeper problem, as Jakub said in >>>>the PR [1]: >>>>>> >>>>>> --q-- >>>>>> GCC uses the ix86_legitimate_combined_insn target hook to disallow >>>>>> misaligned memory into certain SSE instructions. >>>>>> (subreg:V4SI (reg:TI 245 [ MEM[(const struct bitset >>>>&)FeatureEntry_21 + 8] ]) 0) >>>>>> is not misaligned memory, it is a subreg of a pseudo register, so it >>>>is fine. >>>>>> If the replacement of the pseudo register with memory happens in >>>>some >>>>>> other pass, then it probably either should use the >>>>>> legitimate_combined_insn target hook or some other one. I think we >>>>>> have already a PR where that happens during live range shrinking. >>>>>> --/q-- >>>>>> >>>>>> Please figure out where memory replacement happens. There are >>>>several >>>>>> other SSE insns (please grep the .md for "ssememalign" attribute) >>>>that >>>>>> are affected by this problem, so fixing a couple of patterns won't >>>>>> solve the problem completely. >>>>> >>>>> LRA turns >>>>> >>>>> insn 64 63 108 6 (set (reg:V4SI 148 [ vect__28.85 ]) >>>>> (xor:V4SI (reg:V4SI 149) >>>>> (subreg:V4SI (reg:TI 147 [ MEM[(const struct bitset >>>>> &)FeatureEntry_2(D)] ]) 0))) foo.ii:26 3454 {*xorv4si3} >>>>> (expr_list:REG_DEAD (reg:V4SI 149) >>>>> (expr_list:REG_DEAD (reg:TI 147 [ MEM[(const struct bitset >>>>> &)FeatureEntry_2(D)] ]) >>>>> (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 >>>>frame) >>>>> (const_int -16 [0xfffffffffffffff0])) [3 >>>>> MEM[(unsigned int *)&D.2851]+0 S16 A128]) >>>>> (nil))))) >>>>> >>>>> into >>>>> >>>>> (insn 64 63 108 6 (set (reg:V4SI 21 xmm0 [orig:148 vect__28.85 ] >>>>[148]) >>>>> (xor:V4SI (reg:V4SI 21 xmm0 [149]) >>>>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] >>>>[117]) >>>>> [6 MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32]))) >>>>> foo.ii:26 3454 {*xorv4si3} >>>>> (expr_list:REG_EQUIV (mem/c:V4SI (plus:DI (reg/f:DI 20 frame) >>>>> (const_int -16 [0xfffffffffffffff0])) [3 >>>>MEM[(unsigned >>>>> int *)&D.2851]+0 S16 A128]) >>>>> (nil))) >>>>> >>>>> since >>>>> >>>>> (mem:V4SI (reg/v/f:DI 4 si [orig:117 FeatureEntry ] [117]) [6 >>>>> MEM[(const struct bitset &)FeatureEntry_2(D)]+0 S16 A32]))) >>>>> >>>>> satisfies the 'm" constraint. I don't think LRA should call >>>>> ix86_legitimate_combined_insn to validate to validate constraints on >>>>> an instruction. >>>> >>>>Hm... >>>> >>>>if LRA desn't assume that generic "m" constraint implies at least >>>>natural alignment of propageted operand, then your patch is the way to >>>>go. >>> >>> I don't think it even considers alignment. Archs where alignment validity >>> depends on the actual instruction should model this with proper constraints. >>> >>> But in this case, *every* SSE vector memory constraint should be >>>>changed to Bm. >>> >>> I'd say so ... >> >> The "Bm" constraint should be applied only to non-move SSE >> instructions with 16-byte memory operand. >> > > Here are 3 patch which implement it. There is one exception > on SSE *mov<mode>_internal. With Bm, LRA will crash, which > may be an LRA bug. I used m as workaround. > > Tested on x86-64 without regressions. OK for trunk?
Looking at the comment in Patch 3, I'd say let's keep *mov<mode>_internal constraints unchanged. But it looks to me that we have to finally relax if ((TARGET_AVX || TARGET_IAMCU) && (misaligned_operand (operands[0], <MODE>mode) || misaligned_operand (operands[1], <MODE>mode))) condition to allow unaligned moves for all targets, not only AVX and IAMCU. The rationale for this decision is that if the RA won't be able to satisfy Bm constraint, it can load the value into XMM register. This will be done through SSE *mov<mode> internal, so unaligned move has to be generated. But please, double check the changes. In Patch 2, I have found: @ -2041,10 +2041,10 @@ (set_attr "mode" "<MODE>")]) (define_insn "*ieee_smax<mode>3" - [(set (match_operand:VF 0 "register_operand" "=v,v") + [(set (match_operand:VF 0 "register_operand" "=x,v") (unspec:VF [(match_operand:VF 1 "register_operand" "0,v") - (match_operand:VF 2 "nonimmediate_operand" "vm,vm")] + (match_operand:VF 2 "vector_operand" "xm,vm")] UNSPEC_IEEE_MAX))] "TARGET_SSE" "@ "xm,vm" constraint can't be right. It shoulld be "xBm,vm". Reading your other explanation about 1,2,4 and 8 byte alignment checks, I agree your patch is the way to go. Please drop Patch 3 and retest patches 1 and 2 with *mov<mode>_internal changed as proposed above. Also, please also ask Kirill for a review, as the patch touches his area. Uros.