Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref

Kewen.Lin Tue, 02 Jul 2019 20:21:27 -0700

Hi Richard,

Thanks very much for reviewing my patch.  I'll update it as your comments.
Before sending the next version, I've several questions embedded for further 
check.


on 2019/7/2 下午8:43, Richard Biener wrote:
> On Wed, 20 Mar 2019, Kewen.Lin wrote:
> 
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } 
>> } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { 
>> powerpc*-*-* } } } */
> 
> Use
> 
> /* { dg-additional-options "-mvsx" { target { powerpc...
> 
> that saves duplicate typing.  I guess that x86_64/i?86 also works
> if you enable SSE2 - can you check that and do adjustments accordingly?
> 

OK, I'll learn SSE2 and update it. 

>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
>> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
>> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  
>> */
>> +struct v_info
>> +{
>> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> 
> with SVE this probably needs to be poly_int64 or so
> 

OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?)

>> +  auto_vec<unsigned, 32> ops_indexes;
>> +};
> 
> To have less allocations you could use a
> 
>      auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements;
> 
> or even
> 
>  hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > >
> 
> where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS 
> (vector_type)) during collecting and then can use quick_push ()
> (ah, no - duplicates...).
> 

Good idea!

>> +
>> +typedef struct v_info *v_info_ptr;
>> +
>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
>> +static int
>> +unsigned_cmp (const void *p_i, const void *p_j)
>> +{
>> +  if (*(const unsigned HOST_WIDE_INT *) p_i
>> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
>> +    return 1;
>> +  else
>> +    return -1;
> 
> That's an issue with some qsort implementations comparing
> an element against itself.
> 
> I think so you should add the equality case.
> 

The equality case seems already involved in ">=".
Do you mean that I need to make it equality case explicitly? 
Or return zero for "=="? like:

   
   const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i;
   const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j;
   if (val_i != val_j)
     return val_i > val_j? 1: -1;
   else
     return 0;

>> +
>> +   TODO:
>> +    1) The current implementation restrict all candidate VECTORs should have
>> +       the same VECTOR type, but it can be extended into different groups by
>> +       VECTOR types in future if any profitable cases found.
>> +    2) The current implementation requires the whole VECTORs should be fully
>> +       covered, but it can be extended to support partial, checking adjacent
>> +       but not fill the whole, it may need some cost model to define the
>> +       boundary to do or not.
>> +

>> +      tree elem_type = TREE_TYPE (vec_type);
>> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE 
>> (elem_type));
>> +      if (size != TREE_INT_CST_LOW (op1))
> 
>   if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1))
> 
> avoids some of the TREE_INT_CST_LOW we like to avoid.
> 
> You miss a check for op2 % op1 being zero.  Given you store a 
> HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2
> (beware of SVE...).

OK, thanks!  For op2 % op1 == zero, I thought it's a must for granted, I'll fix 
it.
I think it can be constructed in IR artificially, but since I have almost none 
knowledge 
on other targets vector support, I'm curious that does it exist in real world 
codes?

btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to 
be updated
due to SVE?

> 
> There's also a poly-int friendly multiple_p predicate so you could
> have the initial checks SVE friendly but bail out on non-INTEGER_CST
> offset later.
> 

Got it, thanks!

> 
> Since you are using a hashtable keyed off SSA name pointers code
> generation will depend on host addresses here if you consider
> there's one mismatching vector type that might become ref_vec
> dependent on the order of elements in the hashtable.  That's
> a no-no.
> 
> Even if doing it like above looks to possibly save compile-time
> by avoiding qsort calls I think the appropriate thing to do is
> to partition the vector candidates into sets of compatible
> vectors (consider summing two v2df and two v4df vectors for example)
> and then take out ones you disqualify because they do not cover
> the vector in full.
> 

You are absolutely right, the randomness of hashtable keys order could 
be a problem here.  The partition part is TODO 1).  Since Power has only
one kind whole vector width now, doesn't have v2df and v4df co-existence,
it's put into TODO.  I will extend it in the next version of patch, add
one more hashtable from vector type mode to v_info_map, feel free to
correct me if you have any concerns.

> I think whether the vector is fully covered can be done way cheaper
> as well by using a sbitmap and clearing a bit whenever its 
> corresponding lane is in the vector (and bailing out if the bit
> was already clear since you then got a duplicate).  So start
> with bitmap_ones () and at the end check bitmap_empty_p ().
> I don't think you depend on the vector being sorted later?
> 

Good idea, yes, it doesn't depend on sorted or not.

>> +    {
>> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
>> +      info = *(v_info_map.get (tr));
>> +      unsigned j;
>> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
>> +    {
>> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>> +      gimple_set_visited (def, true);
> 
> you set the visited flag to get the ops definition DCEd eventually, right?
> Note this in a comment.
> 

Yes, it's for that.  Will add the comment, thanks!

Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref

Reply via email to