On Tue, May 8, 2018 at 11:11 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Mon, Apr 30, 2018 at 9:19 PM, Jakub Jelinek <ja...@redhat.com> wrote: >> Hi! >> >> Before avx512vl we don't have a single instruction to do V2DImode and >> V4DImode abs, but that isn't much different from say V4SImode before SSE3 >> where we also just emit a short sequence that is better than elementwise >> expansion. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for >> trunk? >> >> 2018-04-30 Jakub Jelinek <ja...@redhat.com> >> >> PR target/85572 >> * config/i386/i386.c (ix86_expand_sse2_abs): Handle E_V2DImode and >> E_V4DImode. >> * config/i386/sse.md (abs<mode>2): Use VI_AVX2 iterator instead of >> VI1248_AVX512VL_AVX512BW. Handle V2DImode and V4DImode if not >> TARGET_AVX512VL using ix86_expand_sse2_abs. Formatting fixes. >> >> * g++.dg/other/sse2-pr85572-1.C: New test. >> * g++.dg/other/sse2-pr85572-2.C: New test. >> * g++.dg/other/sse4-pr85572-1.C: New test. >> * g++.dg/other/avx2-pr85572-1.C: New test. > > LGTM. > > Thanks, > Uros. > >> --- gcc/config/i386/i386.c.jj 2018-04-25 15:09:29.895453703 +0200 >> +++ gcc/config/i386/i386.c 2018-04-30 18:31:56.027101932 +0200 >> @@ -49806,39 +49806,74 @@ ix86_expand_sse2_abs (rtx target, rtx in >> >> switch (mode) >> { >> + case E_V2DImode: >> + case E_V4DImode: >> + /* For 64-bit signed integer X, with SSE4.2 use >> + pxor t0, t0; pcmpgtq X, t0; pxor t0, X; psubq t0, X. >> + Otherwise handle it similarly to V4SImode, except use 64 as W >> instead of >> + 32 and use logical instead of arithmetic right shift (which is >> + unimplemented) and subtract. */ >> + if (TARGET_SSE4_2) >> + { >> + tmp0 = gen_reg_rtx (mode); >> + tmp1 = gen_reg_rtx (mode); >> + emit_move_insn (tmp1, CONST0_RTX (mode)); >> + if (mode == E_V2DImode) >> + emit_insn (gen_sse4_2_gtv2di3 (tmp0, tmp1, input)); >> + else >> + emit_insn (gen_avx2_gtv4di3 (tmp0, tmp1, input));
} else { >> + tmp0 = expand_simple_binop (mode, LSHIFTRT, input, >> + GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1), >> + NULL, 0, OPTAB_DIRECT); >> + tmp0 = expand_simple_unop (mode, NEG, tmp0, NULL, false); } >> + tmp1 = expand_simple_binop (mode, XOR, tmp0, input, >> + NULL, 0, OPTAB_DIRECT); >> + x = expand_simple_binop (mode, MINUS, tmp1, tmp0, >> + target, 0, OPTAB_DIRECT); >> + break; You could merge parts of the above code. Uros.