[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2022-03-08 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #5 from Kewen Lin --- (In reply to Richard Biener from comment #3) > On x86 we even have > > Vector cost: 136 > Scalar cost: 196 > > note that we seem to vectorize the reduction but that only happens with > -ffast-math, not -O2

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #4 from Richard Biener --- note vectorizer costing does not look at dependencies at all, it just sums up individual instruction latencies (and assumes unlimited throughput as well).

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #3 from Richard Biener --- On x86 we even have Vector cost: 136 Scalar cost: 196 note that we seem to vectorize the reduction but that only happens with -ffast-math, not -O2 -ftree-slp-vectorize? One issue is the association o

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #2 from Kewen Lin --- Back to the optimized IR, I thought the problem is that the vectorized version has longer critical path for the reduc_plus result (latency in total). For vectorized version, _51 = diffa_41(D) * 1.

[Bug tree-optimization/101944] suboptimal SLP for reduced case from namd_r

2021-08-17 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944 --- Comment #1 from Kewen Lin --- The original costing shows the vectorized version wins, by checking the costings, it missed to model the cost of lane extraction, the patch was posted in: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/57