Uros, Here is update patch which includes (1) couple changes proposed by Richard in tree-vect-loop.c and (2) the changes in back-end proposed by you.
Is it OK for trunk? Bootstrap and regression testing dis not show any new failures. ChangeLog: 2016-01-29 Yuri Rumyantsev <ysrum...@gmail.com> PR middle-end/68542 * config/i386/i386.c (ix86_expand_branch): Add support for conditional branch with vector comparison. *config/i386/sse.md (Vi48_AVX): New mode iterator. (define_expand "cbranch<mode>4): Add support for conditional branch with vector comparison. * tree-vect-loop.c (optimize_mask_stores): New function. * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize has_mask_store field of vect_info. * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for vectorized loops having masked stores after vec_info destroy. * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and correspondent macros. (optimize_mask_stores): Add prototype. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-mask-store-move-1.c: New test. * testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: Likewise. 2016-01-29 15:26 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>: > On Fri, Jan 29, 2016 at 1:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Uros, >> >> Thanks for your comments. >> I deleted swap of operands as you told. >> Let me explain my point in adding support for conditional branches >> with vector comparison. >> This feature is used to put vectorized masked stores and its >> producers under guard that checks that mask is not zero, i.e. if mask >> which is result of other vector computations is zero we don't need to >> execute correspondent masked store and its producers if they don't >> have other uses. It means that only integer 128-bit and 256-bit >> vectors must be accepted as operands of cbranch. I did not introduce >> new iterator but simply used existence iterator V48_AVX2. BTW you >> proposed to add new iterator VI_AVX but it would be better to ad >> VI48_AVX as >> >> (define_mode_iterator Vi48_AVX >> [(V4SI "TARGET_AVX") (V2DI "TARGET_AVX") >> (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")]) >> >> I also don't think that we need to add support in expand_compare since >> such comparisons are not generated. > > OK with me. If there is no need for cstore pattern, then the > comparison can be integrated with existing code in expand_branch by > using ""goto simple" as is already the case there. > > BR, > Uros. > >> 2016-01-28 20:08 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>: >>> Yuri, >>> >>> please find attached a target-dependent patch that illustrates my >>> review remarks. The patch is lightly tested, and it produces desired >>> ptest insns on the testcases you provided. >>> >>> Some further remarks: >>> >>> + tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx); >>> + if (code == EQ) >>> + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp, >>> + gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx); >>> + else >>> + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp, >>> + pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label)); >>> + emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); >>> + return; >>> >>> The above code is IMO wrong. You don't need to swap the arms of the >>> target, since "code" will generate je or jne. Please see the attached >>> patch. >>> >>> BTW: Maybe we can introduce corresponding cstore pattrn to use ptest >>> in order to more efficiently vectorize code like: >>> >>> --cut here-- >>> int a[256]; >>> >>> int foo (void) >>> { >>> int ret = 0; >>> int i; >>> >>> for (i = 0; i < 256; i++) >>> { >>> if (a[i] != 0) >>> ret = 1; >>> } >>> return ret; >>> } >>> --cut here-- >>> >>> Uros.
PR68542.patch.3
Description: Binary data