http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59163
--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Uroš Bizjak from comment #14) > (In reply to Uroš Bizjak from comment #13) > > (In reply to Jakub Jelinek from comment #12) > > > Created attachment 31332 [details] > > > gcc49-pr59163.patch > > > > > > So like this? > > > > Yes, with adjusted comment in ix86_legitimate_combined_insn. > > > > IIRC, unaligned moves won't be propagated during or after reload, so it > > looks to me that the approach is correct. > > Running the testsuite with your patch applied exposed a minor problem: > > FAIL: gcc.target/i386/sse-1.c scan-assembler-not movaps > > movlps/movhps and movlpd/movhpd can also handle unaligned operands (please > see ix86_expand_vector_move_misalign). We should simply tag instructions > that operate on unaligned operands (attribute type = ssemovu) and check type > attribute instead. > > The proposed approach would mean to change all scheduler and attribute > calculation checks from "ssemov" to "ssemov,ssemovu", but this would be a > simple mechanical change. Yeah, I've noticed that too, plus the dumps I've used to note what instructions have been rejected by the patch show that UNSPEC_LDDQU would need to be treated like UNSPEC_LOADU. The patch made difference for 32-bit: gcc.target/i386/sse-1.c (as you write above) gcc.dg/torture/pr18582-1.c (UNSPEC_LDDQU) gcc.target/i386/sse4_1-movntdqa.c (UNSPEC_MOVNTDQA) and 64-bit also g++.dg/torture/pr59163.C (desirable) Now, for movntdqa, I think it accepts only aligned memory, but the MEM in there is supposed to be aligned and is created by if (i == memory) { /* This must be the memory operand. */ op = ix86_zero_extend_to_Pmode (op); op = gen_rtx_MEM (mode, op); gcc_assert (GET_MODE (op) == mode || GET_MODE (op) == VOIDmode); } and there is similar code for builtins that store. Supposedly for this we should use get_pointer_alignment (arg) and at least set_mem_align (op, get_pointer_alignment (arg)); if it is larger than MEM_ALIGN (op). The gcc_assert doesn't make any sense to me, result of gen_rtx_MEM (mode, op) will always have GET_MODE (op) == mode, no need to assert that and it will never have VOIDmode. Now, if we could easily find out which of the builtins assume aligned memory (and to what extent), we should also set it, because say using _mm_stream_load_si128 with not 128-bit aligned memory is user error, so GCC should be able to assume A128 there. I'd say the sse-1.c case is similar, isn't it?