https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #5 from Kewen Lin ---
(In reply to Richard Biener from comment #3)
> On x86 we even have
>
> Vector cost: 136
> Scalar cost: 196
>
> note that we seem to vectorize the reduction but that only happens with
> -ffast-math, not -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #4 from Richard Biener ---
note vectorizer costing does not look at dependencies at all, it just sums up
individual instruction latencies (and assumes unlimited throughput as well).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #3 from Richard Biener ---
On x86 we even have
Vector cost: 136
Scalar cost: 196
note that we seem to vectorize the reduction but that only happens with
-ffast-math, not -O2 -ftree-slp-vectorize?
One issue is the association o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #2 from Kewen Lin ---
Back to the optimized IR, I thought the problem is that the vectorized
version has longer critical path for the reduc_plus result (latency in total).
For vectorized version,
_51 = diffa_41(D) *
1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101944
--- Comment #1 from Kewen Lin ---
The original costing shows the vectorized version wins, by checking
the costings, it missed to model the cost of lane extraction, the
patch was posted in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/57