https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #12 from Richard Biener ---
(In reply to Jan Hubicka from comment #11)
> trunk -O3 -flto -march=native -fopenmp
> Operation: Sharpen:
> 257
> 256
> 256
>
> Average: 256 Iterations Per Minute
> GCC13 -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #11 from Jan Hubicka ---
trunk -O3 -flto -march=native -fopenmp
Operation: Sharpen:
257
256
256
Average: 256 Iterations Per Minute
GCC13 -O3 -flto -march=native -fopenmp
257
256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #10 from Richard Biener ---
We now also apply SLP vectorizing the loop, but as said the high VF is probably
prohibitive and causes quite some spilling:
.L7:
vmovdqu (%r14), %ymm2
vmovdqu 32(%r14), %ymm1
subq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #9 from Richard Biener ---
Note SLPing k[u] won't help to reduce the VF, only selecting a smaller vector
size would. The alternative is to have a power-of-two group size by using
masking for the 'opacity' field.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #8 from Richard Biener ---
Since r14-2007-g6f19cf7526168f we now vectorize the loop but without SLP
which means we get interleaving and a vectorization factor of 64. Turning
off loop vectorization yields the following which is now c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #7 from Hongtao.liu ---
> pixel$red_60(D)(10)>, type of def: internal
> t.c:18:27: missed: no optab.
> t.c:18:27: missed: not vectorized: relevant stmt not supported: _29 =
> (unsigned char) pixel$red_78;
> t.c:18:27: note: Bu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #6 from Richard Biener ---
Btw, we would also be able to vectorize just the red and green channel:
t.c:18:27: note: * Analysis succeeded with vector mode V4SF
t.c:18:27: note: SLPing BB part
t.c:18:27: note: Costing subgraph:
t.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #5 from Jan Hubicka ---
In sharpening the number of iterations depends on sharpen radius. Not sure what
it is for the benchmark, but in normal situations the number of iterations is
indeed not very large.
However clang simply slp ve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Richard Biener changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Jan Hubicka changed:
What|Removed |Added
Status|WAITING |NEW
--- Comment #3 from Jan Hubicka ---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
Richard Biener changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #1 from Hongtao.liu ---
One of the vectorizer issues is related to PR110018.
12 matches
Mail list logo