Uros,

Here is update patch which includes (1) couple changes proposed by
Richard in tree-vect-loop.c and (2) the changes in back-end proposed
by you.

Is it OK for trunk?
Bootstrap and regression testing dis not show any new failures.

ChangeLog:

2016-01-29  Yuri Rumyantsev  <ysrum...@gmail.com>

PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Add support for conditional
branch with vector comparison.
*config/i386/sse.md (Vi48_AVX): New mode iterator.
(define_expand "cbranch<mode>4): Add support for conditional branch
with vector comparison.
* tree-vect-loop.c (optimize_mask_stores): New function.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores after vec_info destroy.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-mask-store-move-1.c: New test.
* testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c: Likewise.

2016-01-29 15:26 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>:
> On Fri, Jan 29, 2016 at 1:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Uros,
>>
>> Thanks for your comments.
>> I deleted swap of operands as you told.
>> Let me explain my point in adding support for conditional branches
>> with vector comparison.
>> This feature is used to put  vectorized masked stores  and its
>> producers under guard that checks that mask is not zero, i.e. if mask
>> which is result of other vector computations is zero we don't need to
>> execute correspondent masked store and its producers if they don't
>> have other uses. It means that only integer 128-bit and 256-bit
>> vectors must be accepted as operands of cbranch. I did not introduce
>> new iterator but simply used existence iterator V48_AVX2. BTW you
>> proposed to add new iterator VI_AVX but it would be better to ad
>> VI48_AVX as
>>
>> (define_mode_iterator Vi48_AVX
>> [(V4SI "TARGET_AVX") (V2DI "TARGET_AVX")
>> (V8SI "TARGET_AVX") (V4DI "TARGET_AVX")])
>>
>> I also don't think that we need to add support in expand_compare since
>> such comparisons are not  generated.
>
> OK with me. If there is no need for cstore pattern, then the
> comparison can be integrated with existing code in expand_branch by
> using ""goto simple" as is already the case there.
>
> BR,
> Uros.
>
>> 2016-01-28 20:08 GMT+03:00 Uros Bizjak <ubiz...@gmail.com>:
>>> Yuri,
>>>
>>> please find attached a target-dependent patch that illustrates my
>>> review remarks. The patch is lightly tested, and it produces desired
>>> ptest insns on the testcases you provided.
>>>
>>> Some further remarks:
>>>
>>> +      tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
>>> +      if (code == EQ)
>>> +    tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
>>> +                    gen_rtx_LABEL_REF (VOIDmode, label), pc_rtx);
>>> +      else
>>> +    tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
>>> +                    pc_rtx, gen_rtx_LABEL_REF (VOIDmode, label));
>>> +      emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
>>> +      return;
>>>
>>> The above code is IMO wrong. You don't need to swap the arms of the
>>> target, since "code" will generate je or jne. Please see the attached
>>> patch.
>>>
>>> BTW: Maybe we can introduce corresponding cstore pattrn to use ptest
>>> in order to more efficiently vectorize code like:
>>>
>>> --cut here--
>>> int a[256];
>>>
>>> int foo (void)
>>> {
>>>   int ret = 0;
>>>   int i;
>>>
>>>   for (i = 0; i < 256; i++)
>>>     {
>>>       if (a[i] != 0)
>>>     ret = 1;
>>>     }
>>>   return ret;
>>> }
>>> --cut here--
>>>
>>> Uros.

Attachment: PR68542.patch.3
Description: Binary data

Reply via email to