On 11/7/23 21:31, Maxim Blinov wrote:
I see, thanks for clarifying, that makes sense.
In that case, what about doing the inverse? I mean, are there unique
patches in the vendor branch, and would it be useful to try to
upstream them into master? My motivation is to get the best
autovectorized code for RISC-V.
There should be nothing on the vendor branch that is not already on the
trunk. If there is, something has gone horribly wrong.
The process we've used over there is pretty simple. Start with the
gcc-13 branch, then cherry pick risc-v backend & testsuite changes from
the trunk as well as limited target independent changes (primarily those
which the risc-v backend depends on, or which we know/expect are
important for risc-v for one reason or another).
I had a go at building the TSVC benchmark (in the llvm-test-suite[1]
repository) both with the master and vendor branch gcc, and noticed
that the vendor branch gcc generally beats master in generating more
vector instructions.
If I simply count the number of instances of each vector instruction,
the average across all 36 test cases of vendor vs master gcc features
the following most prominent differences:
- vmv.x.s: 48 vs 0 (+ 48)
- vle32.v: 150 vs 50 (+ 100)
- vrgather.vv: 61 vs 0 (+ 61)
- vslidedown.vi: 61 vs 0 (+ 61)
- vse32.v: 472 vs 213 (+ 459)
- vmsgtu.vi: 30 vs 0 (+ 30)
- vadd.vi: 80 vs 30 (+ 50)
- vlm.v: 18 vs 0 (+ 18)
- vsm.v: 16 vs 0 (+ 16)
- vmv4r.v: 21 vs 7 (+ 14)
(For reference, the benchmarks are all between 20k-30k in code size.
Built with `-march=rv64imafdcv -O3`.)
Ofcourse that doesn't say anything about performance, but would it be
possible/fair to say that the vendor branch may still be better than
master for generating vectorized code for RISC-V?
What's interesting is that there's very little "regression" - I saw
only very few cases where the vendor branch removed a vector
instruction as compared to master gcc (the most often removed
instruction by the vendor branch, as compared to master, is
vsetvl/vsetvli.)
If the vendor branch is generating better code than the trunk then
that's an indication that target independent changes on the trunk from
the gcc-14 development cycle need some work ;)
Just comparing the static number of instructions isn't useful at all
IMHO. Now you can get dynamic instructions from various QEMU plugins at
which point the data becomes much more interesting -- though you have to
be careful interpreting that as well.
Jeff