Assuming a fully pipelined vector unit (and from experience on AArch64), an u-arch's scalar-to-vector move cost is likely to play a significant role in whether this will be profitable or not.
--Philipp. On Wed, 31 May 2023 at 00:10, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 5/30/23 16:01, 钟居哲 wrote: > > I agree with Andrew. > > > > And I don't think this patch is appropriate for following reasons: > > 1. This patch increases vector workload in machine since > > it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi. > This is probably uarch dependent. I can probably construct cases where > the first will be better and I can probably construct cases where the > latter will be better. In fact the recommendation from our uarch team > is to generally do this stuff on the vector side. > > > > > 2. For multi-issue OoO machine, scalar instructions are very cheap > > when they are located in vector codegen. For example a sequence > > like this: > > scalar insn > > scalar insn > > vector insn > > scalar insn > > vector insn > > .... > > In such situation, we can issue multiple instructions simultaneously, > > and the latency of scalar instructions will be hided so scalar > > instruction > > is cheap. Wheras this patch increasing vector pipeline workload > > is not > > friendly to OoO machine what I mentioned above. > I probably need to be careful what I say here :-) I'll go with mixing > vector/scalar code may incur certain penalties on some > microarchitectures depending on the exact code sequences involved. > > > > 3. I can image the only benefit of this patch is that we can reduce > > scalar register pressure > > in some extreme circumstances. However, I don't this benefit is > > "real" since GCC should > > well schedule the instruction sequence when we well tune the > > vector instructions scheduling > > model and cost model to make such register live range very short > > when the scalar register > > pressure is very high. > > > > Overal, I disagree with this patch. > What I think this all argues is that it'll likely need to be uarch > dependent. I'm not yet sure how to describe the properties of the > uarch in a concise manner to put into our costing structure yet though. > > jeff