Adding CCs.
2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich....@gmail.com>: > 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: >>> On 27 Aug 09:55, Richard Biener wrote: >>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> >>>> wrote: >>>> > >>>> > Yes, I want to try it. But getting rid of bool patterns would mean >>>> > support for all targets currently supporting vec_cond. Would it be OK >>>> > to have vector<bool> mask co-exist with bool patterns for some time? >>>> >>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be >>>> able >>>> to figure out the correct vector type (or mask type) from the uses. >>>> Currently >>>> it simply looks at the stmts LHS type but as all stmt operands already have >>>> vector types it can as well compute the result type from those. We'd want >>>> to >>>> have a helper function that does this result type computation as I figure >>>> it >>>> will be needed in multiple places. >>>> >>>> This is now on my personal TODO list (but that's already quite long for >>>> GCC 6), >>>> so if you manage to get to that... see >>>> tree-vect-loop.c:vect_determine_vectorization_factor >>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their >>>> vector type set from data-ref analysis already - there 'bool' loads >>>> correctly get >>>> VNQImode). There is a basic-block / SLP part as well that would need to >>>> use >>>> the helper function (eventually with some SLP discovery order issue). >>>> >>>> > Thus first step would be to require vector<bool> for MASK_LOAD and >>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and >>>> > MASK_STORE). >>>> >>>> You can certainly try that first, but as soon as you hit complications with >>>> needing to adjust bool patterns then I'd rather get rid of them. >>>> >>>> > >>>> > I can directly build a vector type with specified mode to avoid it. >>>> > Smth. like: >>>> > >>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, >>>> > current_vector_size); >>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode); >>>> >>>> Hmm, indeed, that might be a (good) solution. Btw, in this case >>>> target attribute >>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending >>>> on the active target). There would also be no way for the user to >>>> declare vector<bool> >>>> in source (which is good because of that target attribute issue...). >>>> >>>> So yeah. Adding a tree.c:build_truth_vector_type (unsigned nunits) >>>> and adjusting >>>> truth_type_for is the way to go. >>>> >>>> I suggest you try modifying those parts first according to this scheme >>>> that will most >>>> likely uncover issues we missed. >>>> >>>> Thanks, >>>> Richard. >>>> >>> >>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. >>> There were no major issues (for now). >>> >>> build_truth_vector_type and get_mask_type_for_scalar_type were added to >>> build a mask type. It is always a vector of bools but its mode is >>> determined by a target using number of units and currently used vector >>> length. >>> >>> As previously I fixed if-conversion to apply boolean masks for loads and >>> stores which automatically disables bool patterns for them and flow goes by >>> a mask path. Vectorization factor computation is fixed to have a separate >>> computation for mask types. Comparison is now handled separately by >>> vectorizer and is vectorized into vector comparison. >>> >>> Optabs for masked loads and stores were transformed into convert optabs. >>> Now it is checked using both value and mask modes. >>> >>> Optabs for comparison were added. These are also convert optabs checking >>> value and result type. >>> >>> I had to introduce significant number of new patterns in i386 target to >>> support new optabs. The reason was vector compare was never expanded >>> separately and always was a part of a vec_cond expansion. >> >> Indeed. >> >>> As a result it's possible to use the sage GIMPLE representation for both >>> vector and scalar masks target type. Here is an example I used as a simple >>> test: >>> >>> for (i=0; i<N; i++) >>> { >>> float t = a[i]; >>> if (t > 0.0f && t < 1.0e+2f) >>> if (c[i] != 0) >>> c[i] = 1; >>> } >>> >>> Produced vector GIMPLE (before expand): >>> >>> vect_t_5.22_105 = MEM[base: _256, offset: 0B]; >>> mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, >>> 0.0 }; >>> mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, >>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 }; >>> mask__8.27_110 = mask__6.23_107 & mask__7.25_109; >>> vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110); >>> mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 }; >>> mask__37.35_120 = mask__8.27_110 & mask__36.33_119; >>> MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 >>> }); >> >> Looks good to me. >> >>> Produced assembler on AVX-512: >>> >>> vmovups (%rdi), %zmm0 >>> vcmpps $25, %zmm5, %zmm0, %k1 >>> vcmpps $22, %zmm3, %zmm0, %k1{%k1} >>> vmovdqa32 -64(%rdx), %zmm2{%k1} >>> vpcmpd $4, %zmm1, %zmm2, %k1{%k1} >>> vmovdqa32 %zmm4, (%rcx){%k1} >>> >>> Produced assembler on AVX-2: >>> >>> vmovups (%rdx), %xmm1 >>> vinsertf128 $0x1, -16(%rdx), %ymm1, %ymm1 >>> vcmpltps %ymm1, %ymm3, %ymm0 >>> vcmpltps %ymm5, %ymm1, %ymm1 >>> vpand %ymm0, %ymm1, %ymm0 >>> vpmaskmovd -32(%rcx), %ymm0, %ymm1 >>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>> vpcmpeqd %ymm2, %ymm1, %ymm1 >>> vpand %ymm0, %ymm1, %ymm0 >>> vpmaskmovd %ymm4, %ymm0, (%rax) >>> >>> BTW AVX-2 code produced by trunk compiler is 4 insns longer: >>> >>> vmovups (%rdx), %xmm0 >>> vinsertf128 $0x1, -16(%rdx), %ymm0, %ymm0 >>> vcmpltps %ymm0, %ymm6, %ymm1 >>> vcmpltps %ymm7, %ymm0, %ymm0 >>> vpand %ymm1, %ymm5, %ymm2 >>> vpand %ymm0, %ymm2, %ymm1 >>> vpcmpeqd %ymm3, %ymm1, %ymm0 >>> vpandn %ymm4, %ymm0, %ymm0 >>> vpmaskmovd -32(%rcx), %ymm0, %ymm0 >>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>> vpandn %ymm1, %ymm0, %ymm0 >>> vpcmpeqd %ymm3, %ymm0, %ymm0 >>> vpandn %ymm4, %ymm0, %ymm0 >>> vpmaskmovd %ymm5, %ymm0, (%rax) >>> >>> >>> For now I still don't disable bool patterns, thus new masks apply to masked >>> loads and stores only. Patch is also not tested and tried on several small >>> tests only. Could you please look at what I currently have and say if it's >>> in sync with your view on vector masking? >> >> So apart from bool patterns and maybe implementation details (didn't >> look too closely at the patch yet, maybe tomorrow), there is >> >> + /* Or a boolean vector type with the same element count >> + as the comparison operand types. */ >> + else if (TREE_CODE (type) == VECTOR_TYPE >> + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) >> + { >> >> so we now allow both, integer typed and boolean typed comparison >> results? I was hoping that on GIMPLE >> we can canonicalize to a single form, the boolean one and for the >> "old" style force the use of VEC_COND exprs >> (which we did anyway, AFAIK). The comparison in the VEC_COND would >> still have vector bool result type. >> >> I expect the vectorization factor changes to "vanish" if we remove >> bool patterns and re-org vector type deduction >> >> Richard. >> > > Totally disabling old style vector comparison and bool pattern is a > goal but doing hat would mean a lot of regressions for many targets. > Do you want to it to be tried to estimate amount of changes required > and reveal possible issues? What would be integration plan for these > changes? Do you want to just introduce new vector<bool> in GIMPLE > disabling bool patterns and then resolving vectorization regression on > all targets or allow them live together with following target switch > one by one from bool patterns with finally removing them? Not all > targets are likely to be adopted fast I suppose. > > Ilya