On Wed, 20 Oct 2021, Andre Vieira (lists) wrote: > On 15/10/2021 09:48, Richard Biener wrote: > > On Tue, 12 Oct 2021, Andre Vieira (lists) wrote: > > > >> Hi Richi, > >> > >> I think this is what you meant, I now hide all the unrolling cost > >> calculations > >> in the existing target hooks for costs. I did need to adjust 'finish_cost' > >> to > >> take the loop_vinfo so the target's implementations are able to set the > >> newly > >> renamed 'suggested_unroll_factor'. > >> > >> Also added the checks for the epilogue's VF. > >> > >> Is this more like what you had in mind? > > Not exactly (sorry..). For the target hook I think we don't want to > > pass vec_info but instead another output parameter like the existing > > ones. > > > > vect_estimate_min_profitable_iters should then via > > vect_analyze_loop_costing and vect_analyze_loop_2 report the unroll > > suggestion to vect_analyze_loop which should then, if the suggestion > > was > 1, instead of iterating to the next vector mode run again > > with a fixed VF (old VF times suggested unroll factor - there's > > min_vf in vect_analyze_loop_2 which we should adjust to > > the old VF times two for example and maybe store the suggested > > factor as hint) - if it succeeds the result will end up in the > > list of considered modes (where we now may have more than one > > entry for the same mode but a different VF), we probably want to > > only consider more unrolling once. > > > > For simplicity I'd probably set min_vf = max_vf = old VF * suggested > > factor, thus take the targets request literally. > > > > Richard. > > Hi, > > I now pass an output parameter to finish_costs and route it through the > various calls up to vect_analyze_loop. I tried to rework > vect_determine_vectorization_factor and noticed that merely setting min_vf and > max_vf is not enough, we only use these to check whether the vectorization > factor is within range, well actually we only use max_vf at that stage. We > only seem to use 'min_vf' to make sure the data_references are valid. I am > not sure my changes are the most appropriate here, for instance I am pretty > sure the checks for max and min vf I added in > vect_determine_vectorization_factor are currently superfluous as they will > pass by design, but thought they might be good future proofing?
Ah, ok - well, max_vf is determined by dependence analysis so we do have to check that. I think that + && known_le (unrolled_vf, min_vf)) is superfluous. So if we use min_vf as passed to vect_analyze_loop_2 to carry the suggested unroll factor then the changes look reasonable. What's if (max_vf != MAX_VECTORIZATION_FACTOR - && maybe_lt (max_vf, min_vf)) + && loop_vinfo->suggested_unroll_factor == 1 + && max_vf < estimated_poly_value (min_vf, POLY_VALUE_MAX)) return opt_result::failure_at (vect_location, "bad data dependence.\n"); LOOP_VINFO_MAX_VECT_FACTOR (loop_vinfo) = max_vf; supposed to be though? Why can we allow the unroll case to get past this point? I suppose initializing max_vf to MAX_VECTORIZATION_FACTOR doesn't play well with poly-ints? That is, how does estimated_poly_value (min_vf, POLY_VALUE_MAX) differ here? I would have expected that we can drop the max_vf != MAX_VECTORIZATION_FACTOR part but otherwise leave the test the same? Note you adjust the vectorization factor in vect_determine_vectorization_factor but I think you have to delay that until after vect_update_vf_for_slp which means doing it before the start_over: label. @@ -3038,6 +3203,18 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) if (res) { + /* Only try unrolling main loops. */ + if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo)) + { + opt_loop_vec_info unrolled_vinfo = + vect_try_unrolling (loop_vinfo, &n_stmts, + suggested_unroll_factor); + if (unrolled_vinfo) + loop_vinfo = unrolled_vinfo; + /* Reset suggested_unroll_factor for next loop_vinfo. */ + suggested_unroll_factor = 1; + } so it looks like this eventually will leak 'loop_vinfo' but it also looks like you throw away the not unrolled analysis and costing. I was suggesting to simply add the unrolled variant to the alternatives considered - the flow of vect_analyze_loop is already quite complicated and the changes following this hunk do not help :/ (and I'm not sure what vect_reanalyze_as_main_loop is supposed to do or why we need to consider unrolling there as well). If we don't want to globally consider the unrolled variant maybe we can at least decide between the unrolled and not unrolled variant in vect_try_unrolling? But of course only the whole series, (unrolled) main + vectorized epilogues, determine the final cost. That said, the overall flow is OK now, some details about the max_vf check and where to compute the unrolled VF needs to be fleshed out. And then there's the main analysis loop which, frankly, is a mess right now, even before your patch :/ I'm thinking of rewriting the analysis loop in vect_analyze_loop to use a worklist initially seeded by the vector_modes[] but that we can push things like as-main-loop, unrolled and epilogue analysis to. Maybe have the worklist specify pairs of mode and kind or tuples of mode, min-VF and kind where 'kind' is as-main/epilogue/unroll (though maybe 'kind' is redundant there). Possibly as preparatory step. > Also I changed how we compare against max_vf, rather than relying on the > 'MAX_VECTORIZATION' I decided to use the estimated_poly_value with > POLY_VALUE_MAX, to be able to bound it further in case we have knowledge of > the VL. I am not entirely about the validity of this change, maybe we are > better off keeping the MAX_VECTORIZATION in place and not making any changes > to max_vf for unrolling. Yeah, I guess that would simplify things. The only bit we really want to make sure is that we don't re-analyze with exactly the same VF, so + if (vect_analyze_loop_2 (unrolled_vinfo, unrolling_fatal, n_stmts, + &suggested_unroll_factor, + unrolled_vf) + && known_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo), + LOOP_VINFO_VECT_FACTOR (unrolled_vinfo))) the VF check here should be always true. I was initially considering to take a target hit of unrolling by 4 as to also allow two times unrolling if max_vf constrains us this way but of course never unroll more than 4 times. But we can play with adjustments here later (also considering #pragma unroll, etc.) Richard, do you have any ideas on how to improve the iteration mess we have? Richard.