On Mon, 28 Oct 2019, Andre Vieira (lists) wrote: > Hi, > > Reworked according to your comments, see inline for clarification. > > Is this OK for trunk?
+ gimple_seq seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo); + while (seq) + { + stmt_worklist.safe_push (seq); + seq = seq->next; + } you're supposed to do to the following, not access the ->next implementation detail: for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi); gsi_next (&gsi)) stmt_worklist.safe_push (gsi_stmt (gsi)); + /* Data references for gather loads and scatter stores do not use the + updated offset we set using ADVANCE. Instead we have to make sure the + reference in the data references point to the corresponding copy of + the original in the epilogue. */ + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) + { + DR_REF (dr) + = simplify_replace_tree (DR_REF (dr), NULL_TREE, NULL_TREE, + &find_in_mapping, &mapping); + DR_BASE_ADDRESS (dr) + = simplify_replace_tree (DR_BASE_ADDRESS (dr), NULL_TREE, NULL_TREE, + &find_in_mapping, &mapping); + } Hmm. So for other DRs we account for the previous vector loop by adjusting DR_OFFSET? But STMT_VINFO_GATHER_SCATTER_P ends up using (unconditionally) DR_REF here? In that case it seems best to adjust DR_REF only but NULL out DR_BASE_ADDRESS and DR_OFFSET? I wonder how prologue peeling deals with STMT_VINFO_GATHER_SCATTER_P ... I see the caller of vect_update_init_of_dr there does nothing for STMT_VINFO_GATHER_SCATTER_P. I wonder if (as followup to not delay this further) we can "offload" all the DR adjustment by storing ADVANCE in dr_vec_info and accounting for that when we create the dataref pointers in vectorizable_load/store? That way we could avoid saving/restoring DR_OFFSET as well. So, the patch is OK with the sequence iteration fixed. I think sorting out the above can be done as followup. Thanks, Richard. > gcc/ChangeLog: > 2019-10-28 Andre Vieira <andre.simoesdiasvie...@arm.com> > > PR 88915 > * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. > * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter > and make the valueize function pointer also take a void pointer. > * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > around vn_valueize, to call it without a context. > (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > (~_loop_vec_info): Release epilogue_vinfos. > (vect_analyze_loop_costing): Use knowledge of main VF to estimate > number of iterations of epilogue. > (vect_analyze_loop_2): Adapt to analyse main loop for all supported > vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > versioning threshold needed for main loop. > (vect_analyze_loop): Likewise. > (find_in_mapping): New helper function. > (update_epilogue_loop_vinfo): New function. > (vect_transform_loop): When vectorizing epilogues re-use analysis done > on main loop and call update_epilogue_loop_vinfo to update it. > * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > stmts on loop preheader edge. > (vect_do_peeling): Enable skip-vectors when doing loop versioning if > we decided to vectorize epilogues. Update epilogues NITERS and > construct ADVANCE to update epilogues data references where needed. > * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > (vect_do_peeling, vect_update_inits_of_drs, > determine_peel_for_niter, vect_analyze_loop): Add or update declarations. > * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > created loop_vec_info's for epilogues when available. Otherwise > analyse > epilogue separately. > > > > Cheers, > Andre > > On 28/10/2019 14:16, Richard Biener wrote: > > On Fri, 25 Oct 2019, Andre Vieira (lists) wrote: > > > >> Hi, > >> > >> This is the reworked patch after your comments. > >> > >> I have moved the epilogue check into the analysis form disguised under > >> '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing > >> the > >> "lowest threshold" check there. > >> > >> The only place where we may reject an epilogue_vinfo is when we know the > >> number of scalar iterations and we realize the number of iterations left > >> after > >> the main loop are not enough to enter the vectorized epilogue so we > >> optimize > >> away that code-gen. The only way we know this to be true is if the number > >> of > >> scalar iterations are known and the peeling for alignment is known. So we > >> know > >> we will enter the main loop regardless, so whether the threshold we use is > >> for > >> a lower VF or not it shouldn't matter as much, I would even like to think > >> that > >> check isn't done, but I am not sure... Might be worth checking as an > >> optimization. > >> > >> > >> Is this OK for trunk? > > > > + for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]); > > + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi)) > > + { > > .. > > + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo)) > > + pattern_worklist.safe_push (stmt_vinfo); > > + > > + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > > + while (related_vinfo && related_vinfo != stmt_vinfo) > > + { > > > > I think PHIs cannot have patterns. You can assert > > that STMT_VINFO_RELATED_STMT is NULL I think. > > Done. > > > > + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); > > + while (related_vinfo && related_vinfo != stmt_vinfo) > > + { > > + related_worklist.safe_push (related_vinfo); > > + /* Set BB such that the assert in > > + 'get_initial_def_for_reduction' is able to determine that > > + the BB of the related stmt is inside this loop. */ > > + gimple_set_bb (STMT_VINFO_STMT (related_vinfo), > > + gimple_bb (new_stmt)); > > + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo); > > + } > > > > do we really keep references to "nested" patterns? Thus, do you > > need this loop? > > Changed and added asserts. They didn't trigger so I suppose you are right, I > didn't know at the time whether it was possible, so I just operated on the > side of caution. Can remove the asserts and so on if you want. > > > > + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the > > + original main loop and thus need to be updated to refer to the > > cloned > > + variables used in the epilogue. */ > > + for (unsigned i = 0; i < pattern_worklist.length (); ++i) > > + { > > ... > > + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE, > > + &find_in_mapping, &mapping); > > + gimple_set_op (seq, j, op); > > > > you do this for the pattern-def seq but not for the related one. > > I guess you ran into this for COND_EXPR conditions. I wondered > > to use a shared worklist for both the def-seq and the main pattern > > stmt or at least to split out the replacement so you can share it. > > I think that was it yeah, reworked it now to use the same list. Less code, > thanks! > > > > + /* Data references for gather loads and scatter stores do not use > > the > > + updated offset we set using ADVANCE. Instead we have to make > > sure the > > + reference in the data references point to the corresponding copy > > of > > + the original in the epilogue. */ > > + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) > > + { > > + int j; > > + if (TREE_CODE (DR_REF (dr)) == MEM_REF) > > + j = 0; > > + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF) > > + j = 1; > > + else > > + gcc_unreachable (); > > + > > + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j))) > > + { > > + DR_REF (dr) = unshare_expr (DR_REF (dr)); > > + TREE_OPERAND (DR_REF (dr), j) = *new_op; > > + } > > > > huh, do you really only ever see MEM_REF or ARRAY_REF here? > > I would guess using simplify_replace_tree is safer. > > There's also DR_BASE_ADDRESS - we seem to leave the DRs partially > > updated, is that correct? > > Yeah can use simplify_replace_tree indeed. And I have changed it so it > updates DR_BASE_ADDRESS. I think DR_BASE_ADDRESS never actually changed in > the way we use data_references... Either way, replacing them if they do change > is cleaner and more future proof. > > > > Otherwise looks OK to me. > > > > Thanks, > > Richard. > > > > > >> gcc/ChangeLog: > >> 2019-10-25 Andre Vieira <andre.simoesdiasvie...@arm.com> > >> > >> PR 88915 > >> * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. > >> * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter > >> and make the valueize function pointer also take a void pointer. > >> * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > >> around vn_valueize, to call it without a context. > >> (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > >> * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > >> (~_loop_vec_info): Release epilogue_vinfos. > >> (vect_analyze_loop_costing): Use knowledge of main VF to estimate > >> number of iterations of epilogue. > >> (vect_analyze_loop_2): Adapt to analyse main loop for all supported > >> vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > >> versioning threshold needed for main loop. > >> (vect_analyze_loop): Likewise. > >> (find_in_mapping): New helper function. > >> (update_epilogue_loop_vinfo): New function. > >> (vect_transform_loop): When vectorizing epilogues re-use analysis done > >> on main loop and call update_epilogue_loop_vinfo to update it. > >> * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > >> stmts on loop preheader edge. > >> (vect_do_peeling): Enable skip-vectors when doing loop versioning if > >> we decided to vectorize epilogues. Update epilogues NITERS and > >> construct ADVANCE to update epilogues data references where needed. > >> * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > >> (vect_do_peeling, vect_update_inits_of_drs, > >> determine_peel_for_niter, vect_analyze_loop): Add or update > >> declarations. > >> * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > >> created loop_vec_info's for epilogues when available. Otherwise > >> analyse > >> epilogue separately. > >> > >> > >> > >> Cheers, > >> Andre > >> > > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)