Adding CCs.

2015-09-03 15:03 GMT+03:00 Ilya Enkovich <enkovich....@gmail.com>:
> 2015-09-01 17:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Tue, Sep 1, 2015 at 3:08 PM, Ilya Enkovich <enkovich....@gmail.com> wrote:
>>> On 27 Aug 09:55, Richard Biener wrote:
>>>> On Wed, Aug 26, 2015 at 5:51 PM, Ilya Enkovich <enkovich....@gmail.com> 
>>>> wrote:
>>>> >
>>>> > Yes, I want to try it. But getting rid of bool patterns would mean
>>>> > support for all targets currently supporting vec_cond. Would it be OK
>>>> > to have vector<bool> mask co-exist with bool patterns for some time?
>>>>
>>>> No, I'd like to remove the bool patterns anyhow - the vectorizer should be 
>>>> able
>>>> to figure out the correct vector type (or mask type) from the uses.  
>>>> Currently
>>>> it simply looks at the stmts LHS type but as all stmt operands already have
>>>> vector types it can as well compute the result type from those.  We'd want 
>>>> to
>>>> have a helper function that does this result type computation as I figure 
>>>> it
>>>> will be needed in multiple places.
>>>>
>>>> This is now on my personal TODO list (but that's already quite long for 
>>>> GCC 6),
>>>> so if you manage to get to that...  see
>>>> tree-vect-loop.c:vect_determine_vectorization_factor
>>>> which computes STMT_VINFO_VECTYPE for all stmts but loads (loads get their
>>>> vector type set from data-ref analysis already - there 'bool' loads
>>>> correctly get
>>>> VNQImode).  There is a basic-block / SLP part as well that would need to 
>>>> use
>>>> the helper function (eventually with some SLP discovery order issue).
>>>>
>>>> > Thus first step would be to require vector<bool> for MASK_LOAD and
>>>> > MASK_STORE and support it for i386 (the only user of MASK_LOAD and
>>>> > MASK_STORE).
>>>>
>>>> You can certainly try that first, but as soon as you hit complications with
>>>> needing to adjust bool patterns then I'd rather get rid of them.
>>>>
>>>> >
>>>> > I can directly build a vector type with specified mode to avoid it. 
>>>> > Smth. like:
>>>> >
>>>> > mask_mode = targetm.vectorize.get_mask_mode (nunits, 
>>>> > current_vector_size);
>>>> > mask_type = make_vector_type (bool_type_node, nunits, mask_mode);
>>>>
>>>> Hmm, indeed, that might be a (good) solution.  Btw, in this case
>>>> target attribute
>>>> boundaries would be "ignored" (that is, TYPE_MODE wouldn't change depending
>>>> on the active target).  There would also be no way for the user to
>>>> declare vector<bool>
>>>> in source (which is good because of that target attribute issue...).
>>>>
>>>> So yeah.  Adding a tree.c:build_truth_vector_type (unsigned nunits)
>>>> and adjusting
>>>> truth_type_for is the way to go.
>>>>
>>>> I suggest you try modifying those parts first according to this scheme
>>>> that will most
>>>> likely uncover issues we missed.
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>
>>> I tried to implement this scheme and apply it for MASK_LOAD and MASK_STORE. 
>>>  There were no major issues (for now).
>>>
>>> build_truth_vector_type and get_mask_type_for_scalar_type were added to 
>>> build a mask type.  It is always a vector of bools but its mode is 
>>> determined by a target using number of units and currently used vector 
>>> length.
>>>
>>> As previously I fixed if-conversion to apply boolean masks for loads and 
>>> stores which automatically disables bool patterns for them and flow goes by 
>>> a mask path.  Vectorization factor computation is fixed to have a separate 
>>> computation for mask types.  Comparison is now handled separately by 
>>> vectorizer and is vectorized into vector comparison.
>>>
>>> Optabs for masked loads and stores were transformed into convert optabs.  
>>> Now it is checked using both value and mask modes.
>>>
>>> Optabs for comparison were added.  These are also convert optabs checking 
>>> value and result type.
>>>
>>> I had to introduce significant number of new patterns in i386 target to 
>>> support new optabs.  The reason was vector compare was never expanded 
>>> separately and always was a part of a vec_cond expansion.
>>
>> Indeed.
>>
>>> As a result it's possible to use the sage GIMPLE representation for both 
>>> vector and scalar masks target type.  Here is an example I used as a simple 
>>> test:
>>>
>>>   for (i=0; i<N; i++)
>>>   {
>>>     float t = a[i];
>>>     if (t > 0.0f && t < 1.0e+2f)
>>>       if (c[i] != 0)
>>>         c[i] = 1;
>>>   }
>>>
>>> Produced vector GIMPLE (before expand):
>>>
>>>   vect_t_5.22_105 = MEM[base: _256, offset: 0B];
>>>   mask__6.23_107 = vect_t_5.22_105 > { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
>>> 0.0 };
>>>   mask__7.25_109 = vect_t_5.22_105 < { 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2, 
>>> 1.0e+2, 1.0e+2, 1.0e+2, 1.0e+2 };
>>>   mask__8.27_110 = mask__6.23_107 & mask__7.25_109;
>>>   vect__9.29_116 = MASK_LOAD (vectp_c.30_114, 0B, mask__8.27_110);
>>>   mask__36.33_119 = vect__9.29_116 != { 0, 0, 0, 0, 0, 0, 0, 0 };
>>>   mask__37.35_120 = mask__8.27_110 & mask__36.33_119;
>>>   MASK_STORE (vectp_c.38_125, 0B, mask__37.35_120, { 1, 1, 1, 1, 1, 1, 1, 1 
>>> });
>>
>> Looks good to me.
>>
>>> Produced assembler on AVX-512:
>>>
>>>         vmovups (%rdi), %zmm0
>>>         vcmpps  $25, %zmm5, %zmm0, %k1
>>>         vcmpps  $22, %zmm3, %zmm0, %k1{%k1}
>>>         vmovdqa32       -64(%rdx), %zmm2{%k1}
>>>         vpcmpd  $4, %zmm1, %zmm2, %k1{%k1}
>>>         vmovdqa32       %zmm4, (%rcx){%k1}
>>>
>>> Produced assembler on AVX-2:
>>>
>>>         vmovups (%rdx), %xmm1
>>>         vinsertf128     $0x1, -16(%rdx), %ymm1, %ymm1
>>>         vcmpltps        %ymm1, %ymm3, %ymm0
>>>         vcmpltps        %ymm5, %ymm1, %ymm1
>>>         vpand   %ymm0, %ymm1, %ymm0
>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm1
>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>         vpcmpeqd        %ymm2, %ymm1, %ymm1
>>>         vpand   %ymm0, %ymm1, %ymm0
>>>         vpmaskmovd      %ymm4, %ymm0, (%rax)
>>>
>>> BTW AVX-2 code produced by trunk compiler is 4 insns longer:
>>>
>>>         vmovups (%rdx), %xmm0
>>>         vinsertf128     $0x1, -16(%rdx), %ymm0, %ymm0
>>>         vcmpltps        %ymm0, %ymm6, %ymm1
>>>         vcmpltps        %ymm7, %ymm0, %ymm0
>>>         vpand   %ymm1, %ymm5, %ymm2
>>>         vpand   %ymm0, %ymm2, %ymm1
>>>         vpcmpeqd        %ymm3, %ymm1, %ymm0
>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>         vpmaskmovd      -32(%rcx), %ymm0, %ymm0
>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>         vpandn  %ymm1, %ymm0, %ymm0
>>>         vpcmpeqd        %ymm3, %ymm0, %ymm0
>>>         vpandn  %ymm4, %ymm0, %ymm0
>>>         vpmaskmovd      %ymm5, %ymm0, (%rax)
>>>
>>>
>>> For now I still don't disable bool patterns, thus new masks apply to masked 
>>> loads and stores only.  Patch is also not tested and tried on several small 
>>> tests only.  Could you please look at what I currently have and say if it's 
>>> in sync with your view on vector masking?
>>
>> So apart from bool patterns and maybe implementation details (didn't
>> look too closely at the patch yet, maybe tomorrow), there is
>>
>> +  /* Or a boolean vector type with the same element count
>> +     as the comparison operand types.  */
>> +  else if (TREE_CODE (type) == VECTOR_TYPE
>> +          && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>> +    {
>>
>> so we now allow both, integer typed and boolean typed comparison
>> results?  I was hoping that on GIMPLE
>> we can canonicalize to a single form, the boolean one and for the
>> "old" style force the use of VEC_COND exprs
>> (which we did anyway, AFAIK).  The comparison in the VEC_COND would
>> still have vector bool result type.
>>
>> I expect the vectorization factor changes to "vanish" if we remove
>> bool patterns and re-org vector type deduction
>>
>> Richard.
>>
>
> Totally disabling old style vector comparison and bool pattern is a
> goal but doing hat would mean a lot of regressions for many targets.
> Do you want to it to be tried to estimate amount of changes required
> and reveal possible issues? What would be integration plan for these
> changes? Do you want to just introduce new vector<bool> in GIMPLE
> disabling bool patterns and then resolving vectorization regression on
> all targets or allow them live together with following target switch
> one by one from bool patterns with finally removing them? Not all
> targets are likely to be adopted fast I suppose.
>
> Ilya

Reply via email to