Hi Richard, Thanks very much for reviewing my patch. I'll update it as your comments. Before sending the next version, I've several questions embedded for further check.
on 2019/7/2 下午8:43, Richard Biener wrote: > On Wed, 20 Mar 2019, Kewen.Lin wrote: > >> +/* { dg-require-effective-target vect_double } */ >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } >> } */ >> +/* { dg-options "-O2 -ffast-math" } */ >> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { >> powerpc*-*-* } } } */ > > Use > > /* { dg-additional-options "-mvsx" { target { powerpc... > > that saves duplicate typing. I guess that x86_64/i?86 also works > if you enable SSE2 - can you check that and do adjustments accordingly? > OK, I'll learn SSE2 and update it. >> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME. >> + - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR. >> + - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. >> */ >> +struct v_info >> +{ >> + auto_vec<unsigned HOST_WIDE_INT, 32> offsets; > > with SVE this probably needs to be poly_int64 or so > OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?) >> + auto_vec<unsigned, 32> ops_indexes; >> +}; > > To have less allocations you could use a > > auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements; > > or even > > hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > > > > where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS > (vector_type)) during collecting and then can use quick_push () > (ah, no - duplicates...). > Good idea! >> + >> +typedef struct v_info *v_info_ptr; >> + >> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets. */ >> +static int >> +unsigned_cmp (const void *p_i, const void *p_j) >> +{ >> + if (*(const unsigned HOST_WIDE_INT *) p_i >> + >= *(const unsigned HOST_WIDE_INT *) p_j) >> + return 1; >> + else >> + return -1; > > That's an issue with some qsort implementations comparing > an element against itself. > > I think so you should add the equality case. > The equality case seems already involved in ">=". Do you mean that I need to make it equality case explicitly? Or return zero for "=="? like: const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i; const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j; if (val_i != val_j) return val_i > val_j? 1: -1; else return 0; >> + >> + TODO: >> + 1) The current implementation restrict all candidate VECTORs should have >> + the same VECTOR type, but it can be extended into different groups by >> + VECTOR types in future if any profitable cases found. >> + 2) The current implementation requires the whole VECTORs should be fully >> + covered, but it can be extended to support partial, checking adjacent >> + but not fill the whole, it may need some cost model to define the >> + boundary to do or not. >> + >> + tree elem_type = TREE_TYPE (vec_type); >> + unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE >> (elem_type)); >> + if (size != TREE_INT_CST_LOW (op1)) > > if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1)) > > avoids some of the TREE_INT_CST_LOW we like to avoid. > > You miss a check for op2 % op1 being zero. Given you store a > HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2 > (beware of SVE...). OK, thanks! For op2 % op1 == zero, I thought it's a must for granted, I'll fix it. I think it can be constructed in IR artificially, but since I have almost none knowledge on other targets vector support, I'm curious that does it exist in real world codes? btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to be updated due to SVE? > > There's also a poly-int friendly multiple_p predicate so you could > have the initial checks SVE friendly but bail out on non-INTEGER_CST > offset later. > Got it, thanks! > > Since you are using a hashtable keyed off SSA name pointers code > generation will depend on host addresses here if you consider > there's one mismatching vector type that might become ref_vec > dependent on the order of elements in the hashtable. That's > a no-no. > > Even if doing it like above looks to possibly save compile-time > by avoiding qsort calls I think the appropriate thing to do is > to partition the vector candidates into sets of compatible > vectors (consider summing two v2df and two v4df vectors for example) > and then take out ones you disqualify because they do not cover > the vector in full. > You are absolutely right, the randomness of hashtable keys order could be a problem here. The partition part is TODO 1). Since Power has only one kind whole vector width now, doesn't have v2df and v4df co-existence, it's put into TODO. I will extend it in the next version of patch, add one more hashtable from vector type mode to v_info_map, feel free to correct me if you have any concerns. > I think whether the vector is fully covered can be done way cheaper > as well by using a sbitmap and clearing a bit whenever its > corresponding lane is in the vector (and bailing out if the bit > was already clear since you then got a duplicate). So start > with bitmap_ones () and at the end check bitmap_empty_p (). > I don't think you depend on the vector being sorted later? > Good idea, yes, it doesn't depend on sorted or not. >> + { >> + sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode); >> + info = *(v_info_map.get (tr)); >> + unsigned j; >> + FOR_EACH_VEC_ELT (info->ops_indexes, j, idx) >> + { >> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op); >> + gimple_set_visited (def, true); > > you set the visited flag to get the ops definition DCEd eventually, right? > Note this in a comment. > Yes, it's for that. Will add the comment, thanks!