Re: [RFC] Try vector as a new representation for vector masks

Richard Biener Tue, 01 Sep 2015 07:31:36 -0700

On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
> On 27 Aug 09:55, Richard Biener wrote:
>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> 
>> wrote:
>> >
>> > Yes, I want to try it. But getting rid of bool patterns would mean
>> > support for all targets currently supporting vec_cond. Would it be OK
>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>
>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
>> able
>> to figure out the correct vector type (or mask type) from the uses.  
>> Currently
>> it simply looks at the stmts LHS type but as all stmt operands already have
>> vector types it can as well compute the result type from those.  We'd want to
>> have a helper function that does this result type computation as I figure it
>> will be needed in multiple places.
>>
>> This is now on my personal TODO list (but that's already quite long for GCC 
>> 6),
>> so if you manage to get to that...  see
>> tree-vect-loop.c:vect_determine_vectorization_factor
>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>> vector type set from data-ref analysis already - there 'bool' loads
>> correctly get
>> VNQImode).  There is a basic-block / SLP part as well that would need to use
>> the helper function (eventually with some SLP discovery order issue).
>>
>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>> > MASK_STORE).
>>
>> You can certainly try that first, but as soon as you hit complications with
>> needing to adjust bool patterns then I'd rather get rid of them.
>>
>> >
>> > I can directly build a vector type with specified mode to avoid it. Smth. 
>> > like:
>> >
>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size);
>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>
>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>> target attribute
>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>> on the active target).  There would also be no way for the user to
>> declare vector<bool>
>> in source (which is good because of that target attribute issue...).
>>
>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>> and adjusting
>> truth_type_for is the way to go.
>>
>> I suggest you try modifying those parts first according to this scheme
>> that will most
>> likely uncover issues we missed.
>>
>> Thanks,
>> Richard.
>>
>
> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE.  
> There were no major issues (for now).
>
> build_truth_vector_type and get_mask_type_for_scalar_type were added to build 
> a mask type.  It is always a vector of bools but its mode is determined by a 
> target using number of units and currently used vector length.
>
> As previously I fixed if-conversion to apply boolean masks for loads and 
> stores which automatically disables bool patterns for them and flow goes by a 
> mask path.  Vectorization factor computation is fixed to have a separate 
> computation for mask types.  Comparison is now handled separately by 
> vectorizer and is vectorized into vector comparison.
>
> Optabs for masked loads and stores were transformed into convert optabs.  Now 
> it is checked using both value and mask modes.
>
> Optabs for comparison were added.  These are also convert optabs checking 
> value and result type.
>
> I had to introduce significant number of new patterns in i386 target to 
> support new optabs.  The reason was vector compare was never expanded 
> separately and always was a part of a vec_cond expansion.


Indeed.

> As a result it's possible to use the sage GIMPLE representation for both 
> vector and scalar masks target type.  Here is an example I used as a simple 
> test:
>
>   for (i=0; i<N; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f && t < 1.0e+2f)
>       if (c[i] != 0)
>         c[i] = 1;
>   }
>
> Produced vector GIMPLE (before expand):
>
>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 
> };
>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 
> });

Looks good to me.

> Produced assembler on AVX-512:
>
>         vmovups (%rdi), %zmm0
>         vcmpps  $25, %zmm5, %zmm0, %k1
>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>         vmovdqa32       %zmm4, (%rcx){%k1}
>
> Produced assembler on AVX-2:
>
>         vmovups (%rdx), %xmm1
>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>         vcmpltps        %ymm1, %ymm3, %ymm0
>         vcmpltps        %ymm5, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>         vpand   %ymm0, %ymm1, %ymm0
>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>
> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>
>         vmovups (%rdx), %xmm0
>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>         vcmpltps        %ymm0, %ymm6, %ymm1
>         vcmpltps        %ymm7, %ymm0, %ymm0
>         vpand   %ymm1, %ymm5, %ymm2
>         vpand   %ymm0, %ymm2, %ymm1
>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm1, %ymm0, %ymm0
>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>         vpandn  %ymm4, %ymm0, %ymm0
>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>
>
> For now I still don't disable bool patterns, thus new masks apply to masked 
> loads and stores only.  Patch is also not tested and tried on several small 
> tests only.  Could you please look at what I currently have and say if it's 
> in sync with your view on vector masking?

So apart from bool patterns and maybe implementation details (didn't
look too closely at the patch yet, maybe tomorrow), there is

+  /* Or a boolean vector type with the same element count
+     as the comparison operand types.  */
+  else if (TREE_CODE (type) == VECTOR_TYPE
+          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+    {

so we now allow both, integer typed and boolean typed comparison
results?  I was hoping that on GIMPLE
we can canonicalize to a single form, the boolean one and for the
"old" style force the use of VEC_COND exprs
(which we did anyway, AFAIK).  The comparison in the VEC_COND would
still have vector bool result type.

I expect the vectorization factor changes to "vanish" if we remove
bool patterns and re-org vector type deduction

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-09-01  Ilya Enkovich  <enkovich....@gmail.com>
>
>         * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
>         op_true and op_false.
>         (ix86_int_cmp_code_to_pcmp_immediate): New.
>         (ix86_fp_cmp_code_to_pcmp_immediate): New.
>         (ix86_cmp_code_to_pcmp_immediate): New.
>         (ix86_expand_mask_vec_cmp): New.
>         (ix86_expand_fp_vec_cmp): New.
>         (ix86_expand_int_sse_cmp): New.
>         (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
>         (ix86_expand_int_vec_cmp): New.
>         (ix86_get_mask_mode): New.
>         (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * config/i386/sse.md (avx512fmaskmodelower): New.
>         (vec_cmp<mode><avx512fmaskmodelower>): New.
>         (vec_cmp<mode><sseintvecmodelower>): New.
>         (vec_cmpv2div2di): New.
>         (vec_cmpu<mode><avx512fmaskmodelower>): New.
>         (vec_cmpu<mode><sseintvecmodelower>): New.
>         (vec_cmpuv2div2di): New.
>         (maskload<mode>): Rename to ...
>         (maskload<mode><sseintvecmodelower>): ... this.
>         (maskstore<mode>): Rename to ...
>         (maskstore<mode><sseintvecmodelower>): ... this.
>         (maskload<mode><avx512fmaskmodelower>): New.
>         (maskstore<mode><avx512fmaskmodelower>): New.
>         * doc/tm.texi: Regenerated.
>         * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
>         * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
>         * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
>         (expand_MASK_STORE): Likewise.
>         * optabs.c (vector_compare_rtx): Add OPNO arg.
>         (expand_vec_cond_expr): Adjust to vector_compare_rtx change.
>         (get_vec_cmp_icode): New.
>         (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * optabs.def (vec_cmp_optab): New.
>         (vec_cmpu_optab): New.
>         (maskload_optab): Transform into convert optab.
>         (maskstore_optab): Likewise.
>         * optabs.h (expand_vec_cmp_expr_p): New.
>         (expand_vec_cmp_expr): New.
>         (can_vec_mask_load_store_p): Add MASK_MODE arg.
>         * target.def (get_mask_mode): New.
>         * targhooks.c (default_vector_alignment): Use mode alignment
>         for vector masks.
>         (default_get_mask_mode): New.
>         * targhooks.h (default_get_mask_mode): New.
>         * tree-cfg.c (verify_gimple_comparison): Support vector mask.
>         * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
>         can_vec_mask_load_store_p signature change.
>         (predicate_mem_writes): Use boolean mask.
>         * tree-vect-data-refs.c (vect_get_new_vect_var): Support 
> vect_mask_var.
>         (vect_create_destination_var): Likewise.
>         * tree-vect-generic.c (expand_vector_comparison): Use
>         expand_vec_cmp_expr_p for comparison availability.
>         (expand_vector_operations_1): Ignore statements with scalar mode.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
>         operations for VF.  Add mask type computation.
>         * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask
>         constant.
>         (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
>         signature change.
>         (vectorizable_comparison): New.
>         (vect_analyze_stmt): Add vectorizable_comparison.
>         (vect_transform_stmt): Likewise.
>         (get_mask_type_for_scalar_type): New.
>         * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
>         (enum stmt_vec_info_type): Add comparison_vec_info_type.
>         (get_mask_type_for_scalar_type): New.
>         * tree.c (build_truth_vector_type): New.
>         (truth_type_for): Use build_truth_vector_type for vectors.
>         * tree.h (build_truth_vector_type): New.

Re: [RFC] Try vector as a new representation for vector masks

Reply via email to