On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: > On 27 Aug 09:55, Richard Biener wrote: >> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> >> wrote: >> > >> > Yes, I want to try it. But getting rid of bool patterns would mean >> > support for all targets currently supporting vec_cond. Would it be OK >> > to have vector<bool> mask co-exist with bool patterns for some time? >> >> No, I'd like to remove the bool patterns anyhow - the vectorizer should be >> able >> to figure out the correct vector type (or mask type) from the uses. >> Currently >> it simply looks at the stmts LHS type but as all stmt operands already have >> vector types it can as well compute the result type from those. We'd want to >> have a helper function that does this result type computation as I figure it >> will be needed in multiple places. >> >> This is now on my personal TODO list (but that's already quite long for GCC >> 6), >> so if you manage to get to that... see >> tree-vect-loop.c:vect_determine_vectorization_factor >> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >> vector type set from data-ref analysis already - there 'bool' loads >> correctly get >> VNQImode). There is a basic-block / SLP part as well that would need to use >> the helper function (eventually with some SLP discovery order issue). >> >> > Thus first step would be to require vector<bool> for MASK_LOAD and >> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >> > MASK_STORE). >> >> You can certainly try that first, but as soon as you hit complications with >> needing to adjust bool patterns then I'd rather get rid of them. >> >> > >> > I can directly build a vector type with specified mode to avoid it. Smth. >> > like: >> > >> > mask_mode = targetm.vectorize.get_mask_mode (nunits, current_vector_size); >> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >> >> Hmm, indeed, that might be a (good) solution. Btw, in this case >> target attribute >> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending >> on the active target). There would also be no way for the user to >> declare vector<bool> >> in source (which is good because of that target attribute issue...). >> >> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >> and adjusting >> truth_type_for is the way to go. >> >> I suggest you try modifying those parts first according to this scheme >> that will most >> likely uncover issues we missed. >> >> Thanks, >> Richard. >> > > I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. > There were no major issues (for now). > > build_truth_vector_type and get_mask_type_for_scalar_type were added to build > a mask type. It is always a vector of bools but its mode is determined by a > target using number of units and currently used vector length. > > As previously I fixed if-conversion to apply boolean masks for loads and > stores which automatically disables bool patterns for them and flow goes by a > mask path. Vectorization factor computation is fixed to have a separate > computation for mask types. Comparison is now handled separately by > vectorizer and is vectorized into vector comparison. > > Optabs for masked loads and stores were transformed into convert optabs. Now > it is checked using both value and mask modes. > > Optabs for comparison were added. These are also convert optabs checking > value and result type. > > I had to introduce significant number of new patterns in i386 target to > support new optabs. The reason was vector compare was never expanded > separately and always was a part of a vec_cond expansion.
Indeed. > As a result it's possible to use the sage GIMPLE representation for both > vector and scalar masks target type. Here is an example I used as a simple > test: > > for (i=0; i<N; i++) > { > float t = a[i]; > if (t > 0.0f && t < 1.0e+2f) > if (c[i] != 0) > c[i] = 1; > } > > Produced vector GIMPLE (before expand): > > vect_t_5.22_105 = MEM[base: _256, offset: 0B]; > mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 > }; > mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, > 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; > mask__8.27_110 = mask__6.23_107 & mask__7.25_109; > vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); > mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; > mask__37.35_120 = mask__8.27_110 & mask__36.33_119; > MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 > }); Looks good to me. > Produced assembler on AVX-512: > > vmovups (%rdi), %zmm0 > vcmpps $25, %zmm5, %zmm0, %k1 > vcmpps $22, %zmm3, %zmm0, %k1{%k1} > vmovdqa32 -64(%rdx), %zmm2{%k1} > vpcmpd $4, %zmm1, %zmm2, %k1{%k1} > vmovdqa32 %zmm4, (%rcx){%k1} > > Produced assembler on AVX-2: > > vmovups (%rdx), %xmm1 > vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 > vcmpltps %ymm1, %ymm3, %ymm0 > vcmpltps %ymm5, %ymm1, %ymm1 > vpand %ymm0, %ymm1, %ymm0 > vpmaskmovd -32(%rcx), %ymm0, %ymm1 > vpcmpeqd %ymm2, %ymm1, %ymm1 > vpcmpeqd %ymm2, %ymm1, %ymm1 > vpand %ymm0, %ymm1, %ymm0 > vpmaskmovd %ymm4, %ymm0, (%rax) > > BTW AVX-2 code produced by trunk compiler is 4 insns longer: > > vmovups (%rdx), %xmm0 > vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 > vcmpltps %ymm0, %ymm6, %ymm1 > vcmpltps %ymm7, %ymm0, %ymm0 > vpand %ymm1, %ymm5, %ymm2 > vpand %ymm0, %ymm2, %ymm1 > vpcmpeqd %ymm3, %ymm1, %ymm0 > vpandn %ymm4, %ymm0, %ymm0 > vpmaskmovd -32(%rcx), %ymm0, %ymm0 > vpcmpeqd %ymm3, %ymm0, %ymm0 > vpandn %ymm1, %ymm0, %ymm0 > vpcmpeqd %ymm3, %ymm0, %ymm0 > vpandn %ymm4, %ymm0, %ymm0 > vpmaskmovd %ymm5, %ymm0, (%rax) > > > For now I still don't disable bool patterns, thus new masks apply to masked > loads and stores only. Patch is also not tested and tried on several small > tests only. Could you please look at what I currently have and say if it's > in sync with your view on vector masking? So apart from bool patterns and maybe implementation details (didn't look too closely at the patch yet, maybe tomorrow), there is + /* Or a boolean vector type with the same element count + as the comparison operand types. */ + else if (TREE_CODE (type) == VECTOR_TYPE + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) + { so we now allow both, integer typed and boolean typed comparison results? I was hoping that on GIMPLE we can canonicalize to a single form, the boolean one and for the "old" style force the use of VEC_COND exprs (which we did anyway, AFAIK). The comparison in the VEC_COND would still have vector bool result type. I expect the vectorization factor changes to "vanish" if we remove bool patterns and re-org vector type deduction Richard. > Thanks, > Ilya > -- > gcc/ > > 2015-09-01 Ilya Enkovich <enkovich....@gmail.com> > > * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New. > (ix86_expand_int_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for > op_true and op_false. > (ix86_int_cmp_code_to_pcmp_immediate): New. > (ix86_fp_cmp_code_to_pcmp_immediate): New. > (ix86_cmp_code_to_pcmp_immediate): New. > (ix86_expand_mask_vec_cmp): New. > (ix86_expand_fp_vec_cmp): New. > (ix86_expand_int_sse_cmp): New. > (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp. > (ix86_expand_int_vec_cmp): New. > (ix86_get_mask_mode): New. > (TARGET_VECTORIZE_GET_MASK_MODE): New. > * config/i386/sse.md (avx512fmaskmodelower): New. > (vec_cmp<mode><avx512fmaskmodelower>): New. > (vec_cmp<mode><sseintvecmodelower>): New. > (vec_cmpv2div2di): New. > (vec_cmpu<mode><avx512fmaskmodelower>): New. > (vec_cmpu<mode><sseintvecmodelower>): New. > (vec_cmpuv2div2di): New. > (maskload<mode>): Rename to ... > (maskload<mode><sseintvecmodelower>): ... this. > (maskstore<mode>): Rename to ... > (maskstore<mode><sseintvecmodelower>): ... this. > (maskload<mode><avx512fmaskmodelower>): New. > (maskstore<mode><avx512fmaskmodelower>): New. > * doc/tm.texi: Regenerated. > * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New. > * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results. > * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes. > (expand_MASK_STORE): Likewise. > * optabs.c (vector_compare_rtx): Add OPNO arg. > (expand_vec_cond_expr): Adjust to vector_compare_rtx change. > (get_vec_cmp_icode): New. > (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * optabs.def (vec_cmp_optab): New. > (vec_cmpu_optab): New. > (maskload_optab): Transform into convert optab. > (maskstore_optab): Likewise. > * optabs.h (expand_vec_cmp_expr_p): New. > (expand_vec_cmp_expr): New. > (can_vec_mask_load_store_p): Add MASK_MODE arg. > * target.def (get_mask_mode): New. > * targhooks.c (default_vector_alignment): Use mode alignment > for vector masks. > (default_get_mask_mode): New. > * targhooks.h (default_get_mask_mode): New. > * tree-cfg.c (verify_gimple_comparison): Support vector mask. > * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to > can_vec_mask_load_store_p signature change. > (predicate_mem_writes): Use boolean mask. > * tree-vect-data-refs.c (vect_get_new_vect_var): Support > vect_mask_var. > (vect_create_destination_var): Likewise. > * tree-vect-generic.c (expand_vector_comparison): Use > expand_vec_cmp_expr_p for comparison availability. > (expand_vector_operations_1): Ignore statements with scalar mode. > * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask > operations for VF. Add mask type computation. > * tree-vect-stmts.c (vect_get_vec_def_for_operand): Support mask > constant. > (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p > signature change. > (vectorizable_comparison): New. > (vect_analyze_stmt): Add vectorizable_comparison. > (vect_transform_stmt): Likewise. > (get_mask_type_for_scalar_type): New. > * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var > (enum stmt_vec_info_type): Add comparison_vec_info_type. > (get_mask_type_for_scalar_type): New. > * tree.c (build_truth_vector_type): New. > (truth_type_for): Use build_truth_vector_type for vectors. > * tree.h (build_truth_vector_type): New.