Hi Richard, on 2020/7/2 下午1:20, Kewen.Lin via Gcc-patches wrote: > on 2020/7/1 下午11:17, Richard Sandiford wrote: >> "Kewen.Lin" <li...@linux.ibm.com> writes: >>> on 2020/7/1 上午3:53, Richard Sandiford wrote: >>>> "Kewen.Lin" <li...@linux.ibm.com> writes: [...] >> Hmm, OK. But in that case can you update the names of the variables >> to match? It's confusing to have some nscalars_* variables actually >> count scalars (and thus have “nitems” equivalents) and other nscalars_* >> variables count something else (and thus effectively be nitems_* variables >> themselves). >> > > OK. I'll update the names like nscalars_total/nscalars_step and equivalents > to nitems_total/... (or nunits_total better?) >
Please ignore this part, I have used nitems_ for the names. :) >>>>> + /* Work out how many bits we need to represent the length limit. */ >>>>> + unsigned int nscalars_per_iter_ft = rgl->max_nscalars_per_iter * >>>>> rgl->factor; >>>> >>>> I think this breaks the abstraction. There's no guarantee that the >>>> factor is the same for each rgroup_control, so there's no guarantee >>>> that the maximum bytes per iter comes the last entry. (Also, it'd >>>> be better to avoid talking about bytes if we're trying to be general.) >>>> I think we should take the maximum of each entry instead. >>>> >>> >>> Agree! I guess the above "maximum bytes per iter" is a typo? and you meant >>> "maximum elements per iter"? Yes, the code is for length in bytes, checking >>> the last entry is only reasonable for it. Will update it to check all >>> entries >>> instead. >> >> I meant bytes, since that's what the code is effectively calculating >> (at least for Power). I.e. I think this breaks the abstraction even >> if we assume the Power scheme to measuring length, since in principle >> it's possible to fix different vector sizes in the same vector region. >> > > Sorry I didn't catch the meaning of "it's possible to fix different > vector sizes in the same vector region." I guess if we are counting > bytes, the max nunits per iteration should come from the last entry > since the last one holds max bytes which is the result of > max_nscalar_per_iter * factor. But I agree that it breaks abstraction > here since it's not applied to length in lanes. > By further thought, I guessed you meant we can have different vector sizes for the same loop in future? Yes, the assumption doesn't hold then. > >>>>> + /* Decide whether to use fully-masked approach. */ >>>>> + if (vect_verify_full_masking (loop_vinfo)) >>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; >>>>> + /* Decide whether to use length-based approach. */ >>>>> + else if (vect_verify_loop_lens (loop_vinfo)) >>>>> + { >>>>> + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) >>>>> + || LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)) >>>>> + { >>>>> + if (dump_enabled_p ()) >>>>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>>>> + "can't vectorize this loop with length-based" >>>>> + " partial vectors approach becuase peeling" >>>>> + " for alignment or gaps is required.\n"); >>>>> + LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false; >>>>> + } >>>> >>>> Why are these peeling cases necessary? Peeling for gaps should >>>> just mean subtracting one scalar iteration from the iteration count >>>> and shouldn't otherwise affect the main loop. Similarly, peeling for >>>> alignment can be handled in the normal way, with a scalar prologue loop. >>>> >>> >>> I was thinking to relax this later and to avoid to handle too many cases >>> in the first enablement patch. Since Power hw whose level is able to >>> support >>> vector with length, it supports unaligned load/store, need to construct >>> some cases for them. May I postpone it a bit? Or you prefer me to support >>> it here? >> >> I've no objection to postponing it if there are specific known >> problems that make it difficult, but I think we should at least >> say what they are. On the face of it, I'm not sure why it doesn't >> Just Work, since the way that we control the main loop should be >> mostly orthogonal to how we handle peeled prologue iterations >> and how we handle a single peeled epilogue iteration. >> > > OK, I will remove it to see the impact. By the way, do you think to > use partial vectors for prologue is something worth to trying in future? > I tested the updated patch with this releasing, LOOP_VINFO_PEELING_FOR_GAPS part looks fine, but LOOP_VINFO_PEELING_FOR_ALIGNMENT caused one case to fail at execution during vect-partial-vector-usage=2. So far the patch doesn't handle any niters_skip cases. I think if we want to support it, we have to add some handlings in/like what we have for masking, such as: mask_skip_niters, vect_prepare_for_masked_peels etc. Do you prefer me to extend the support in this patch series? >>> Sorry I might miss something, but all undetermined lengths are generated >>> here, >>> the other places you meant is doc or elsewhere? >> >> For example, we'd need to start querying the length operand of the optabs >> to see what length precision the target uses, since it would be invalid >> to do this optimisation for IVs that are wider than that precision. >> The routine above doesn't seem the right place to do that. >> > > OK, but it seems it's acceptable if the IV wider than the precision since > we allows it out of range? > Please ignore this question, I agree that we have to avoid that case. Sorry that I was misunderstanding it before. :) BR, Kewen