https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123086
Bug ID: 123086
Summary: RISC-V possible optimization of fp move instruction
in SpacemiT-x60 tuning
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: [email protected]
Target Milestone: ---
Target: riscv
Created attachment 63031
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=63031&action=edit
diff for fmv=3
While benchmarking a new spacemit-x60 tuning configuration on the RVV
benchmark,I found unexpected fmv instructions generated by GCC in the
mandelbrot_scalar_f32 example.The issue appears to be related to instruction
scheduling for fmv.
Compiler produces an unnecessary move inside a loop:
fmv.s fa4,fa3 // unnecessary move
Relevant part of the generated assembly:
.L6:
fsub.s fa5,fa5,fa4
addi a5,a5,1
fmv.s fa4,fa3 // <- unnecessary
fadd.s fa3,fa5,fa1
fadd.s fa4,fa4,fa4
fmul.s fa5,fa3,fa3
fmadd.s fa2,fa4,fa2,ft0
fmul.s fa4,fa2,fa2
fadd.s fa0,fa5,fa4
fle.s a4,fa0,ft1
bne a4,zero,.L22
When I modify the tuning description (latency of fmv lowered from 4 → 3), GCC
stops inserting the unnecessary fmv.
RVV Benchmark(mandelbrot_scalar_f32)-heuristic is Bytes per Cycle:
INPUT 10 100 1000 10000 100000 1000000
FMV=4 0.0087719 0.0016358 0.0008379 0.0009498 0.0009138 0.0009143
FMV=3 0.0094339 0.0019402 0.0009999 0.0011320 0.0010906 0.0010913
% diff 7.017246 % 15.68910 % 16.20162 % 16.09540 % 16.21125 % 16.21918 %
We got approximately a 16% improvement for inputs of 100 bytes and above on
RVV mandelbrot_scalar_f32 compared to fmv=4 version
.DIFF:https://www.diffchecker.com/XmrzttMk/
Second minimal reproducer (much simpler)
Code:
float mandelbrot_scalar_f32_reduced()
{
float zx = 1, zy = 0, zxS = 0;
while (zxS<77){
zxS = zy + zx + zx;
zy = zx + zx;
zx = zxS;
}
return zxS;
}
Generated assembly contains another unnecessary move:
.L2:
fadd.s fa4,fa0,fa5
fmv.s fa5,fa0 // <- unnecessary
fadd.s fa0,fa0,fa4
fadd.s fa5,fa5,fa5
flt.s a5,fa0,fa3
bne a5,zero,.L2
ret
Adjusting the latency of fmv does not remove this redundant instruction, unlike
in the first example.
Full snippet:https://godbolt.org/z/6KaYdPahM
Optimization level:-O3 -mtune=spacemit-x60 -march=rv64gcb