https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #20 from rguenther at suse dot de ---
On Mon, 13 Jan 2025, linkw at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
>
> --- Comment #19 from Kewen Lin ---
> (In reply to rguent...@suse.de from comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #19 from Kewen Lin ---
(In reply to rguent...@suse.de from comment #18)
> I think this misses a :s on the negate_expr_p, but I'm not sure this
> "works", so eventually && single_use (@1), given the original expression
> doesn't go awa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #18 from rguenther at suse dot de ---
On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
>
> --- Comment #17 from Kewen Lin ---
> ccp1:
>
> t0_83 = a0_79 + a1_80;
> t1_84
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #17 from Kewen Lin ---
ccp1:
t0_83 = a0_79 + a1_80;
t1_84 = a0_79 - a1_80;
t2_85 = a2_81 + a3_82;
t3_86 = a2_81 - a3_82;
_63 = t0_83 + t2_85;
tmp[i_71][0] = _63;
_64 = t0_83 - t2_85;
tmp[i_71][2] = _64;
_65 = t1_84
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #16 from rguenther at suse dot de ---
On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
>
> --- Comment #15 from Kewen Lin ---
> It looks r15-2820-gab18785840d7b8 has made the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #15 from Kewen Lin ---
It looks r15-2820-gab18785840d7b8 has made the case in #c1 vectorized, nice!
But CPUBench has unsigned type in HADAMARD4:
#if BIT_DEPTH > 8
typedef uint32_t sum_t;
typedef uint64_t sum2_t;
#else
ty
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #14 from Richard Biener ---
Btw, previous work is at refs/users/rguenth/heads/load-perm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #13 from Richard Biener ---
(In reply to Jiangning Liu from comment #12)
> Hi Richi,
>
> > That said, "failure" to identify the common (vector) load is known
> > and I do have experimental patches trying to address that but did
> > n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #12 from Jiangning Liu
---
Hi Richi,
> That said, "failure" to identify the common (vector) load is known
> and I do have experimental patches trying to address that but did
> not yet arrive at a conclusive "best" approach.
It was
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
manolis.tsamis at vrull dot eu changed:
What|Removed |Added
CC||manolis.tsamis at vrull d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
ktkachov at gcc dot gnu.org changed:
What|Removed |Added
Last reconfirmed||2022-07-06
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #9 from Richard Biener ---
The full satd_8x4 looks like the following, the 2nd loop isn't to be
disregarded
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
#define HADAMARD4(d0, d1, d2,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #8 from Kewen Lin ---
Created attachment 49942
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49942&action=edit
vectorized with altivec built-in functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #7 from Kewen Lin ---
(In reply to Richard Biener from comment #6)
> Starting from the loads is not how SLP discovery works so there will be
> zero re-use of code. Sure - the only important thing is you end up
> with a valid SLP grap
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #6 from Richard Biener ---
Starting from the loads is not how SLP discovery works so there will be
zero re-use of code. Sure - the only important thing is you end up
with a valid SLP graph.
But going back to the original testcase an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #5 from Kewen Lin ---
(In reply to Kewen Lin from comment #4)
> One rough idea seems:
> 1) Relax this condition all_uniform_p somehow to get SLP instance building
> to go deeper and get those p1/p2 loads as SLP nodes.
> 2) Introdu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #4 from Kewen Lin ---
(In reply to Kewen Lin from comment #3)
>
> IIUC, in current implementation, we get four grouped stores:
> { tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] } /i=0,1,2,3/ independently
>
> When all these tryings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #3 from Kewen Lin ---
(In reply to Richard Biener from comment #2)
> So the expected vectorization builds vectors
>
> { tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] }
>
> that's not SLP, SLP tries to build the
>
> { tmp[i][0], tmp[
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #2 from Richard Biener ---
So the expected vectorization builds vectors
{ tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] }
that's not SLP, SLP tries to build the
{ tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] }
vector and "succeeds" -
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #1 from Kewen Lin ---
Similar case is x264_pixel_satd_8x4 in x264
https://github.com/mirror/x264/blob/4121277b40a667665d4eea1726aefdc55d12d110/common/pixel.c#L288
20 matches
Mail list logo