[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-12 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #20 from rguenther at suse dot de --- On Mon, 13 Jan 2025, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 > > --- Comment #19 from Kewen Lin --- > (In reply to rguent...@suse.de from comment #1

[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #19 from Kewen Lin --- (In reply to rguent...@suse.de from comment #18) > I think this misses a :s on the negate_expr_p, but I'm not sure this > "works", so eventually && single_use (@1), given the original expression > doesn't go awa

[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-10 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #18 from rguenther at suse dot de --- On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 > > --- Comment #17 from Kewen Lin --- > ccp1: > > t0_83 = a0_79 + a1_80; > t1_84

[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-10 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #17 from Kewen Lin --- ccp1: t0_83 = a0_79 + a1_80; t1_84 = a0_79 - a1_80; t2_85 = a2_81 + a3_82; t3_86 = a2_81 - a3_82; _63 = t0_83 + t2_85; tmp[i_71][0] = _63; _64 = t0_83 - t2_85; tmp[i_71][2] = _64; _65 = t1_84

[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-10 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #16 from rguenther at suse dot de --- On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 > > --- Comment #15 from Kewen Lin --- > It looks r15-2820-gab18785840d7b8 has made the

[Bug tree-optimization/98138] BB vect fail to SLP one case

2025-01-10 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #15 from Kewen Lin --- It looks r15-2820-gab18785840d7b8 has made the case in #c1 vectorized, nice! But CPUBench has unsigned type in HADAMARD4: #if BIT_DEPTH > 8 typedef uint32_t sum_t; typedef uint64_t sum2_t; #else ty

[Bug tree-optimization/98138] BB vect fail to SLP one case

2023-10-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #14 from Richard Biener --- Btw, previous work is at refs/users/rguenth/heads/load-perm

[Bug tree-optimization/98138] BB vect fail to SLP one case

2023-10-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #13 from Richard Biener --- (In reply to Jiangning Liu from comment #12) > Hi Richi, > > > That said, "failure" to identify the common (vector) load is known > > and I do have experimental patches trying to address that but did > > n

[Bug tree-optimization/98138] BB vect fail to SLP one case

2023-10-04 Thread jiangning.liu at amperecomputing dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #12 from Jiangning Liu --- Hi Richi, > That said, "failure" to identify the common (vector) load is known > and I do have experimental patches trying to address that but did > not yet arrive at a conclusive "best" approach. It was

[Bug tree-optimization/98138] BB vect fail to SLP one case

2023-02-01 Thread manolis.tsamis at vrull dot eu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 manolis.tsamis at vrull dot eu changed: What|Removed |Added CC||manolis.tsamis at vrull d

[Bug tree-optimization/98138] BB vect fail to SLP one case

2022-07-06 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 ktkachov at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed||2022-07-06

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-08-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #9 from Richard Biener --- The full satd_8x4 looks like the following, the 2nd loop isn't to be disregarded typedef unsigned char uint8_t; typedef unsigned short uint16_t; typedef unsigned int uint32_t; #define HADAMARD4(d0, d1, d2,

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #8 from Kewen Lin --- Created attachment 49942 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49942&action=edit vectorized with altivec built-in functions

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #7 from Kewen Lin --- (In reply to Richard Biener from comment #6) > Starting from the loads is not how SLP discovery works so there will be > zero re-use of code. Sure - the only important thing is you end up > with a valid SLP grap

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #6 from Richard Biener --- Starting from the loads is not how SLP discovery works so there will be zero re-use of code. Sure - the only important thing is you end up with a valid SLP graph. But going back to the original testcase an

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #5 from Kewen Lin --- (In reply to Kewen Lin from comment #4) > One rough idea seems: > 1) Relax this condition all_uniform_p somehow to get SLP instance building > to go deeper and get those p1/p2 loads as SLP nodes. > 2) Introdu

[Bug tree-optimization/98138] BB vect fail to SLP one case

2021-01-05 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #4 from Kewen Lin --- (In reply to Kewen Lin from comment #3) > > IIUC, in current implementation, we get four grouped stores: > { tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] } /i=0,1,2,3/ independently > > When all these tryings

[Bug tree-optimization/98138] BB vect fail to SLP one case

2020-12-06 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #3 from Kewen Lin --- (In reply to Richard Biener from comment #2) > So the expected vectorization builds vectors > > { tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] } > > that's not SLP, SLP tries to build the > > { tmp[i][0], tmp[

[Bug tree-optimization/98138] BB vect fail to SLP one case

2020-12-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #2 from Richard Biener --- So the expected vectorization builds vectors { tmp[0][0], tmp[1][0], tmp[2][0], tmp[3][0] } that's not SLP, SLP tries to build the { tmp[i][0], tmp[i][1], tmp[i][2], tmp[i][3] } vector and "succeeds" -

[Bug tree-optimization/98138] BB vect fail to SLP one case

2020-12-04 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 --- Comment #1 from Kewen Lin --- Similar case is x264_pixel_satd_8x4 in x264 https://github.com/mirror/x264/blob/4121277b40a667665d4eea1726aefdc55d12d110/common/pixel.c#L288