On Fri, 25 Oct 2019, Andre Vieira (lists) wrote: > Hi, > > This is the reworked patch after your comments. > > I have moved the epilogue check into the analysis form disguised under > '!epilogue_vinfos.is_empty ()'. This because I realized that I am doing the > "lowest threshold" check there. > > The only place where we may reject an epilogue_vinfo is when we know the > number of scalar iterations and we realize the number of iterations left after > the main loop are not enough to enter the vectorized epilogue so we optimize > away that code-gen. The only way we know this to be true is if the number of > scalar iterations are known and the peeling for alignment is known. So we know > we will enter the main loop regardless, so whether the threshold we use is for > a lower VF or not it shouldn't matter as much, I would even like to think that > check isn't done, but I am not sure... Might be worth checking as an > optimization. > > > Is this OK for trunk?
+ for (epilogue_phi_gsi = gsi_start_phis (epilogue_bbs[i]); + !gsi_end_p (epilogue_phi_gsi); gsi_next (&epilogue_phi_gsi)) + { .. + if (STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo)) + pattern_worklist.safe_push (stmt_vinfo); + + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); + while (related_vinfo && related_vinfo != stmt_vinfo) + { I think PHIs cannot have patterns. You can assert that STMT_VINFO_RELATED_STMT is NULL I think. + related_vinfo = STMT_VINFO_RELATED_STMT (stmt_vinfo); + while (related_vinfo && related_vinfo != stmt_vinfo) + { + related_worklist.safe_push (related_vinfo); + /* Set BB such that the assert in + 'get_initial_def_for_reduction' is able to determine that + the BB of the related stmt is inside this loop. */ + gimple_set_bb (STMT_VINFO_STMT (related_vinfo), + gimple_bb (new_stmt)); + related_vinfo = STMT_VINFO_RELATED_STMT (related_vinfo); + } do we really keep references to "nested" patterns? Thus, do you need this loop? + /* The PATTERN_DEF_SEQs in the epilogue were constructed using the + original main loop and thus need to be updated to refer to the cloned + variables used in the epilogue. */ + for (unsigned i = 0; i < pattern_worklist.length (); ++i) + { ... + op = simplify_replace_tree (op, NULL_TREE, NULL_TREE, + &find_in_mapping, &mapping); + gimple_set_op (seq, j, op); you do this for the pattern-def seq but not for the related one. I guess you ran into this for COND_EXPR conditions. I wondered to use a shared worklist for both the def-seq and the main pattern stmt or at least to split out the replacement so you can share it. + /* Data references for gather loads and scatter stores do not use the + updated offset we set using ADVANCE. Instead we have to make sure the + reference in the data references point to the corresponding copy of + the original in the epilogue. */ + if (STMT_VINFO_GATHER_SCATTER_P (stmt_vinfo)) + { + int j; + if (TREE_CODE (DR_REF (dr)) == MEM_REF) + j = 0; + else if (TREE_CODE (DR_REF (dr)) == ARRAY_REF) + j = 1; + else + gcc_unreachable (); + + if (tree *new_op = mapping.get (TREE_OPERAND (DR_REF (dr), j))) + { + DR_REF (dr) = unshare_expr (DR_REF (dr)); + TREE_OPERAND (DR_REF (dr), j) = *new_op; + } huh, do you really only ever see MEM_REF or ARRAY_REF here? I would guess using simplify_replace_tree is safer. There's also DR_BASE_ADDRESS - we seem to leave the DRs partially updated, is that correct? Otherwise looks OK to me. Thanks, Richard. > gcc/ChangeLog: > 2019-10-25 Andre Vieira <andre.simoesdiasvie...@arm.com> > > PR 88915 > * tree-ssa-loop-niter.h (simplify_replace_tree): Change declaration. > * tree-ssa-loop-niter.c (simplify_replace_tree): Add context parameter > and make the valueize function pointer also take a void pointer. > * gcc/tree-ssa-sccvn.c (vn_valueize_wrapper): New function to wrap > around vn_valueize, to call it without a context. > (process_bb): Use vn_valueize_wrapper instead of vn_valueize. > * tree-vect-loop.c (_loop_vec_info): Initialize epilogue_vinfos. > (~_loop_vec_info): Release epilogue_vinfos. > (vect_analyze_loop_costing): Use knowledge of main VF to estimate > number of iterations of epilogue. > (vect_analyze_loop_2): Adapt to analyse main loop for all supported > vector sizes when vect-epilogues-nomask=1. Also keep track of lowest > versioning threshold needed for main loop. > (vect_analyze_loop): Likewise. > (find_in_mapping): New helper function. > (update_epilogue_loop_vinfo): New function. > (vect_transform_loop): When vectorizing epilogues re-use analysis done > on main loop and call update_epilogue_loop_vinfo to update it. > * tree-vect-loop-manip.c (vect_update_inits_of_drs): No longer insert > stmts on loop preheader edge. > (vect_do_peeling): Enable skip-vectors when doing loop versioning if > we decided to vectorize epilogues. Update epilogues NITERS and > construct ADVANCE to update epilogues data references where needed. > * tree-vectorizer.h (_loop_vec_info): Add epilogue_vinfos. > (vect_do_peeling, vect_update_inits_of_drs, > determine_peel_for_niter, vect_analyze_loop): Add or update declarations. > * tree-vectorizer.c (try_vectorize_loop_1): Make sure to use already > created loop_vec_info's for epilogues when available. Otherwise > analyse > epilogue separately. > > > > Cheers, > Andre > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)