[Bug target/117304] ICE: in emit_move_insn, at expr.cc:4633 with -mavx10.1 and __builtin_ia32_cvtudq2ps512_mask()

2024-11-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117304 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/117839] Redundant vector XOR instructions

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117839 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/73350] AVX512: GCC optimizes away rounding flags

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||liuhongt at gcc dot gnu.org Resolution|--- |FIXED --- Comment #11 from Hongtao Liu --- CSE issue is fixed in GCC8.1, mask issue is fixed in GCC10.1

[Bug target/80862] [x86] Wrong rounding results for some test cases

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862 Bug 80862 depends on bug 73350, which changed state. Bug 73350 Summary: AVX512: GCC optimizes away rounding flags https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 What|Removed |Added --

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #10 from Hongtao Liu --- > > I do wonder about the usefulness of the memory alternative on the > sse_movhlps pattern though, there's the sse_storehps pattern which > also models the store part more precisely as V2SFmode. Is > sse_

[Bug middle-end/117823] sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117823 --- Comment #1 from Hongtao Liu --- The vectorization maybe need ffast-math.

[Bug middle-end/117823] New: sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Currently, sdot_prod only supports integer mode, can it be extended to floating point? For x86, there're vdpbf16ps/vdpphps for dot_prod __bf16/_Float16 -> fl

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116675, which changed state. Bug 116675 Summary: No blend constant permute for V8HImode with just SSE2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 What|Removed |Added ---

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #7 from Hongtao Liu --- (In reply to Rainer Orth from comment #6) > The test is broken: > > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4 > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4 >

[Bug target/117734] Misses VNNI pmaddubsw qi->hi dot_prod

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117734 Hongtao Liu changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED

[Bug middle-end/117823] sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117823 --- Comment #3 from Hongtao Liu --- > Whether it needs -ffast-math depends on how it behaves with respect to > rounding I guess. If (float)bf16 * (float)bf16 + (float)bf16 * (float)bf16 > performs the float add without intermediate rounding for

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #2 from Hongtao Liu --- My guess there's a lower-tripcount(< 128bit vector) hot loop, avx512_two_epilogues only takes more cmp/jcc instructions but doesn't execute any real vector instructions.

[Bug target/117495] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with -ffast-math -mavx10.2-512 and __bf16 compare int

2024-11-13 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117495 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/117418] ICE: in plus_constant, at explow.cc:102 with -mx32 -maddress-mode=long and __builtin_ia32_encodekey256_u32()

2024-11-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117418 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug middle-end/117542] Missed loop vectorization for truncate from float to __bf16.

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 --- Comment #5 from Hongtao Liu --- > Yes, something like this should work. I suggest to polish up a patch > with this also containing the backend pattern adjustments and post it > for review. The alternative is a convert optab for vec_pack_t

[Bug middle-end/117542] Missed loop vectorization for truncate from float to __bf16.

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 --- Comment #3 from Hongtao Liu --- (In reply to Hongtao Liu from comment #2) > (In reply to Richard Biener from comment #1) > > It doesn't even unambiguously specify whether the mode is that of the source > > or the destination. The original i

[Bug target/117697] gcc.target/i386/avx10_2-vmovd-1.c etc. FAIL

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117697 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438 --- Comment #8 from Hongtao Liu --- > > This might in the end be fallout of different sinking?! > > One difference wrt SLP vs. non-SLP is that with SLP we are taking the > initial value as the initial value with SLP while with non-SLP we > ar

[Bug target/117608] [15 Regression] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with __builtin_ia32_prefetch(0, 2, 0, 0) since r15-4833

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117608 --- Comment #6 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #4) > int i; > > void > foo (void) > { > __builtin_prefetch (&i, 2, 0); > } > > ICEs as well since that revision, and I think it actually ICEs on many > targets as w

[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438 --- Comment #7 from Hongtao Liu --- I only observed ~3% regression on ICX, the regressed one takes less instructions but more backend bounds, caused lower IPC and slow down performance.

[Bug target/117734] Misses VNNI pmaddubsw qi->hi dot_prod

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117734 --- Comment #1 from Hongtao Liu --- But there's a saturation inside pmaddubsw, not a simple dot_prod pattern.

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org

[Bug target/117860] GCC emits an unnecessary mov for x86 _addcarry/_subborrow intrinsic calls where the second operand is a constant that is within the range of a 32-bit integer

2024-12-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117860 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 113600, which changed state. Bug 113600 Summary: [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 What|Removed

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 113600, which changed state. Bug 113600 Summary: [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 What|Removed

[Bug target/113600] [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116675, which changed state. Bug 116675 Summary: No blend constant permute for V8HImode with just SSE2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 What|Removed |Added ---

[Bug target/117608] [15 Regression] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with __builtin_ia32_prefetch(0, 2, 0, 0)

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117608 --- Comment #2 from Hongtao Liu --- @hulin please take a look.

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #6) > (In reply to Jakub Jelinek from comment #5) > > So if anything, one would need to decide this on something larger rather > > than small testcases, say build the whol

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
|1 Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org Last reconfirmed||2024-12-03

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 --- Comment #1 from Hongtao Liu --- This is the case which failed the recogonize innermost correctly. typedef unsigned short ggml_fp16_t; static float table_f32_f16[1 << 16]; inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) { un

[Bug tree-optimization/117888] New: cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- > > that's the r15-919-gef27b91b62c3aa change I think.

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 --- Comment #3 from Hongtao Liu --- (In reply to Richard Biener from comment #2) > The question is how we should define innermost - consider > > - loop interchange > - inlining of a function body with a loop into a loop > > the simplest appr

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|liuhongt at gcc

[Bug rtl-optimization/117890] Wrong code with -fvect-cost-model=unlimited

2024-12-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117890 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > (In reply to Hongtao Liu from comment #4) > > The insn is generated by avoid_store_fowarding, and it is valid but failed > > reload > > Reload want to find a insn t

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #4 from Hongtao Liu --- The insn is generated by avoid_store_fowarding, and it is valid but failed reload 170Store forwarding detected: 171From: (insn 24 23 25 2 (set (mem/c:SI (pl

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > The insn is generated by avoid_store_fowarding, and it is valid but failed > reload Reload want to find a insn to move data from GPR to SSE_REGS but *movti_internal

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 Hongtao Liu changed: What|Removed |Added Assignee|liuhongt at gcc dot gnu.org|unassigned at gcc dot gnu.org

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #7 from Hongtao Liu --- 5024 Choosing alt 6 in insn 295: (0) ?jc (1) Yd {*movti_internal} (sp_off=-128) 5025 Change to class INDEX_GPR16 for r273

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #8 from Hongtao Liu --- > Why class is changed to INDEX_GPR16 for r273 Note with -mapxf, ICE disappears

[Bug target/118333] gcc/config/i386/i386-expand.cc:24871: Pointless condition ?

2025-01-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 --- Comment #2 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #1) > (In reply to David Binderman from comment #0) > > Static analyser cppcheck says: > > > > gcc/config/i386/i386-expand.cc:24871:35: warning: Identical condition > > '

[Bug tree-optimization/118189] New: Weired vec_contruct of elements who's from continuous memory

2024-12-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
issed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Blocks: 53947 Target Milestone: --- double foo (double* a, double* b, double c) { c +=

[Bug target/117082] [15 Regression] FAIL: gcc.target/i386/stack-check-17.c since r15-1619-g3b9b8d6cfdf593

2025-02-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117082 --- Comment #6 from Hongtao Liu --- (In reply to H.J. Lu from comment #5) > It isn't a dup of PR 117081 since it is a different failure. But it's caused by the same commit and the same rootcause?

[Bug c++/79786] [12/13/14 Regression] ICE tree check: expected class 'type', have 'declaration' (var_decl) in iamcu_alignment, at config/i386/i386.c:30263

2025-02-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79786 --- Comment #11 from Hongtao Liu --- (In reply to Andrew Pinski from comment #8) > (In reply to Hongtao Liu from comment #7) > > (In reply to Richard Biener from comment #6) > > > Hongtao - do we care about -miamcu? Should we eventually deprecat

[Bug rtl-optimization/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2025-02-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #8 from Hongtao Liu --- (In reply to H.J. Lu from comment #7) > Created attachment 60350 [details] > ira: Don't increase callee-saved register cost by 1000x NOTE, r15-1619-g3b9b8d6cfdf593 improved 500.perlbench_r on many different p

[Bug rtl-optimization/108707] suboptimal allocation with same memory op for many different instructions.

2025-02-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707 --- Comment #11 from Hongtao Liu --- (In reply to Hongtao Liu from comment #10) > (In reply to Pranav Gorantla from comment #9) > > Facing similar issue in gcc-13. Is it possible to backport the fix of this > > Bug 108707 and Bug 109610 to gcc-1

[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555

2025-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #17 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #15) > Created attachment 60411 [details] > gcc15-pr118623.patch > > Untested patch which seems to work for me on the new testcases and > i386.exp=bt*.c so far. When

[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555

2025-02-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #16 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #14) > So, if (reg:CCC flags) being non-zero in RTL means nc and (reg:CCC flags) > being zero in RTL means c, shouldn't *bt be using (compare:CCC > (zero_extract ...) (

[Bug rtl-optimization/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2025-02-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #16 from Hongtao Liu --- (In reply to H.J. Lu from comment #15) > r15-7400-gd3ff498c478ace gave > > $ cat x.c > int f (int); > int > advance (int dz) > { > if (dz > 0) > return (dz + dz) * dz; > else > return dz * f (dz)

[Bug rtl-optimization/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2025-02-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #9 from Hongtao Liu --- (In reply to Hongtao Liu from comment #8) > (In reply to H.J. Lu from comment #7) > > Created attachment 60350 [details] > > ira: Don't increase callee-saved register cost by 1000x > > NOTE, r15-1619-g3b9b8d6

[Bug rtl-optimization/108707] suboptimal allocation with same memory op for many different instructions.

2025-02-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108707 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug rtl-optimization/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2025-02-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #14 from Hongtao Liu --- > can be sinked to else branch(as sub + mov). When jle .L2 is not taken, > it can save one push instruction. And that's why 511.povray_r is improved. plus one pop instruction.

[Bug rtl-optimization/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2025-02-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #13 from Hongtao Liu --- (In reply to H.J. Lu from comment #10) > (In reply to Hongtao Liu from comment #9) > > (In reply to Hongtao Liu from comment #8) > > > (In reply to H.J. Lu from comment #7) > > > > Created attachment 60350 [d

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/117874] [15 Regression] 17% regression for 433.milc on Zen4

2024-12-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117874 --- Comment #11 from Hongtao Liu --- (In reply to Richard Biener from comment #10) > The mult_su3_an part is now resolved. See PR117888 for the rest. Fixed by r15-6097-gee2f19b0937b5efc0b23c4319cbd4a38b27eac6e

[Bug target/118017] [15 Regression] ICE: maximum number of generated reload insns per insn achieved (90) with -Og -frounding-math -mno-80387 -mno-mmx

2024-12-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118017 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug tree-optimization/118055] [15 Regression] gcc.dg/tree-ssa/pr83403-1.c and -2 for CRIS and m68k since r15-6097-gee2f19b0937b5e

2024-12-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118055 --- Comment #3 from Hongtao Liu --- > > Is it perhaps that the test is brittle; mostly target-specific despite being > at the tree-level and that instead the scan-test should be a specific > known-matching target list? The testcase is used to

[Bug tree-optimization/118055] [15 Regression] gcc.dg/tree-ssa/pr83403-1.c and -2 for CRIS and m68k since r15-6097-gee2f19b0937b5e

2024-12-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118055 --- Comment #1 from Hongtao Liu --- I explained in the thread. https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671289.html - BTW arm ci reported 2 regressed testcase so I added * gcc.dg/tree-ssa/pr83403-1.c: Add --param max-

[Bug c++/118021] New: [15 regression] ICE in parser

2024-12-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
++ Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- cat test.i class a; class b { public: ~b(); }; template void d(c, c, e); class h : b { int f; }; template class autovector { public: using i = g; template class j {}; using

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #7 from Hongtao Liu --- > Huh. It looks like this is from a V4SF -> 2xV2DF extension via > vec_unpack_{hi,lo}_expr. > > Originally this is > > (insn 1161 1160 1162 58 (set (reg:V4SF 853) > (vec_select:V4SF (vec_concat:V8S

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #8 from Hongtao Liu --- > vec_unpacks_hi_v4sf create an unintialized (reg:V4SF 853), I guess it may > confuse LRA to allocate a mem for it. For simple case void foo (double* a, float* b, int n) { for (int i = 0; i != n; i++)

[Bug target/118380] GCC is not optimizing computataion and code with avx intrinsics.

2025-01-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118380 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/115777] [12/13/14/15 regression] Severe performance regression on insertion sort at -O2 or above

2025-01-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115777 --- Comment #10 from Hongtao Liu --- > That's probably the conservative answer for BB vectorization, for loop vect > we know all those uses will be also in vector code. For BB vectorization > there is currently no easly reliable check to ensur

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-01-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #3 from Hongtao Liu --- (In reply to Andrew Pinski from comment #1) > I think this is similar to pr 113646 really. Looks like PR 113646 is PGO not autofdo, so the issue could be different.

[Bug gcov-profile/118551] Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-01-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118551 --- Comment #2 from Hongtao Liu --- A hack like below can recove performance and further improved 538.imagick_r by 5% w/ autofdo. The hack prevents the scaling if ipa_count is zero but function body is hot. diff --git a/gcc/predict.cc b/gcc/pr

[Bug gcov-profile/118581] New: auto_profile can't annotate bb with all debug_stmt which assigned value with constant

2025-01-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
IRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- The source code is like if( TEST_FLAG_SWEEP( srcGrid,

[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555

2025-01-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #10 from Hongtao Liu --- > > r12-7751-g919fbffef07555 > > that might have just exposed a latent issue Should be, the guilty commit just extent a splitter to handle reversed condition, didn't see anything abnormal.

[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555

2025-01-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #12 from Hongtao Liu --- 1370Trying 35 -> 20: 1371 35: flags:CCC=cmp(zero_extract(r104:SI,0x1,r105:SI#0),0) 1372 REG_DEAD r104:SI 1373 REG_DEAD r105:SI 1374 20: pc={(flags:CCC!=0)?L26:pc} 1375 REG_BR_PROB 107374183

[Bug rtl-optimization/118623] [12/13/14/15 regression] Miscompile with -O2/3 and -O0/1 since r12-7751-g919fbffef07555

2025-01-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118623 --- Comment #11 from Hongtao Liu --- 283(insn 8 7 9 2 (set (reg:SI 107) 284(const_int 1 [0x1])) "test.c":3:7 -1 285 (nil)) 286(insn 9 8 10 2 (parallel [ 287(set (reg:SI 106 [ e_7 ]) 288(ashift:SI (reg:SI 1

[Bug gcov-profile/118551] New: Autofdo regressed 538.imagick_r by ~10% with -march=x86-64-v3 -O2

2025-01-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- Similar like PR116743, it's related to ipa scaling, but in different place(estimate_bb_frequencies). /* If we

[Bug gcov-profile/118581] auto_profile can't annotate bb with all debug_stmt which assigned value with constant

2025-01-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118581 --- Comment #4 from Hongtao Liu --- Note it's from SPEC2017 519.lbm_r

[Bug gcov-profile/118581] auto_profile can't annotate bb with all debug_stmt which assigned value with constant

2025-01-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118581 --- Comment #5 from Hongtao Liu --- (In reply to Richard Biener from comment #2) > (In reply to Richard Biener from comment #1) > > Does it have counter info for PHI arguments (aka copies emitted on those > > edges)? > > I think yes, so IMO it

[Bug gcov-profile/118581] auto_profile can't annotate bb with all debug_stmt which assigned value with constant

2025-01-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118581 --- Comment #3 from Hongtao Liu --- (In reply to Richard Biener from comment #2) > (In reply to Richard Biener from comment #1) > > Does it have counter info for PHI arguments (aka copies emitted on those > > edges)? > > I think yes, so IMO it

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2025-01-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863 Bug 89863 depends on bug 118333, which changed state. Bug 118333 Summary: gcc/config/i386/i386-expand.cc:24871: Pointless condition ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 What|Removed |Added -

[Bug target/118333] gcc/config/i386/i386-expand.cc:24871: Pointless condition ?

2025-01-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
|--- Assignee|liuhongt at gcc dot gnu.org|unassigned at gcc dot gnu.org Status|ASSIGNED|UNCONFIRMED Ever confirmed|1 |0 --- Comment #2 from Hongtao Liu --- There're typo in the commit, sorry for

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |liuhongt at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2025-01-16 --- Comment #1 from Hongtao Liu --- Mine.

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org Ever confirmed|0 |1

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118489 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug gcov-profile/118508] New: 10% performance drop when enabling autofdo for spec2017 554.roms_r

2025-01-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- with -march=x86-64-v3 -O2. part of dump_gcov is like __step3d_t_mod_MOD_step3d_t total:5500129 head:0 0: 0 29: 0 30

[Bug target/115777] [12/13/14/15 regression] Severe performance regression on insertion sort at -O2 or above

2025-01-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115777 --- Comment #8 from Hongtao Liu --- > in backend costing we do anticipate the vector construction to happen > by loading from memory though, so we don't account for the extra > GPR->xmm move penalty. Yes, I saw something similar before and had

[Bug target/114978] [14/15 regression] 548.exchange2_r 14%-28% regressions on Loongarch64 after gcc 14 snapshot 20240317

2025-03-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114978 --- Comment #33 from Hongtao Liu --- I have a fix in ivopt for x86 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842#c6, you may try to see if that helps?

[Bug target/117069] [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7

2025-03-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 --- Comment #8 from Hongtao Liu --- (In reply to Hongtao Liu from comment #6) > It looks like the testcase is fragile, it's supposed to check the compiler > ability of generating code_6_gottpoff_reloc instruction, but failed since > there's a se

[Bug target/117069] [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7

2025-03-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 --- Comment #9 from Hongtao Liu --- (In reply to Sam James from comment #7) > This stopped failing for me around: > > commit 2bc3ea210565dc7cdbba9adb31acceefed406254 > Author: Sam James > Date: Fri Nov 22 15:20:45 2024 + > > saving

[Bug target/117452] ICE: in patch_jump_insn, at cfgrtl.cc:1303 with -Ofast -mavx10.2 and __bf16

2025-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
||liuhongt at gcc dot gnu.org --- Comment #4 from Hongtao Liu --- I'll take a look.

[Bug target/117069] [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7

2025-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 --- Comment #14 from Hongtao Liu --- (In reply to Sam James from comment #13) > (In reply to Hongtao Liu from comment #9) > > I didn't find this commit in gcc trunk? > > Ah, sorry for confusion: it's from my local test results. Only the date >

[Bug target/117069] [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7

2025-03-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 --- Comment #15 from Hongtao Liu --- (In reply to Sam James from comment #7) > This stopped failing for me around: > > commit 2bc3ea210565dc7cdbba9adb31acceefed406254 > Author: Sam James > Date: Fri Nov 22 15:20:45 2024 + > > saving

[Bug target/115842] [15 Regression] 6.5% slowdown of 548.exchange2_r on Intel Ice Lake

2025-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115842 --- Comment #8 from Hongtao Liu --- (In reply to Tamar Christina from comment #7) > (In reply to Hongtao Liu from comment #6) > > I noticed some double-counting of cost in group-candidate (regarding loop > > invariant expressions), this modific

[Bug target/118753] [15 Regression] [meta-bug] GCC 15 Regression on x86

2025-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118753 Bug 118753 depends on bug 117069, which changed state. Bug 117069 Summary: [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 What|Removed

[Bug target/117069] [15 Regression] gcc.target/i386/apx-ndd-tls-1b.c since r15-268-g9dbff9c05520a7

2025-03-18 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117069 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED

[Bug target/117452] ICE: in patch_jump_insn, at cfgrtl.cc:1303 with -Ofast -mavx10.2 and __bf16

2025-03-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117452 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/119617] ICE: in standard_sse_constant_opcode, at config/i386/i386.cc:5465 with -fzero-call-used-regs=all -mabi=ms -mavx512f -mno-evex512

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119617 --- Comment #3 from Hongtao Liu --- (In reply to Hongtao Liu from comment #2) > (In reply to Richard Biener from comment #1) > > I think we need to disable the effect of -mno-evex512, looks like there's > > still traces of it left? > > Let's ha

[Bug target/119596] x86: too eager use of rep movsq/rep stosq for inlined ops

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119596 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/102294] memset expansion is sometimes slow for small sizes

2025-04-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

[Bug target/119464] VEC_PERM_EXPR not optimized to pslldq instruction for AVX2 and AVX512BW

2025-03-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119464 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment

<    1   2   3   4   5   6   >