[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118489 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug gcov-profile/118508] New: 10% performance drop when enabling autofdo for spec2017 554.roms_r

2025-01-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118508 Bug ID: 118508 Summary: 10% performance drop when enabling autofdo for spec2017 554.roms_r Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118489 Hongtao Liu changed: What|Removed |Added Last reconfirmed||2025-01-16 Status|UNCONFIRMED

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118489 Hongtao Liu changed: What|Removed |Added Last reconfirmed|2025-01-16 00:00:00 | Target Milestone|15.0

[Bug target/118489] [15 Regression][avx512] ICE in ix86_expand_vector_bf2sf_with_vec_perm, at config/i386/i386-expand.cc:26917 since r15-4955-g648bd1fcc6acfc

2025-01-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118489 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at

[Bug target/118333] gcc/config/i386/i386-expand.cc:24871: Pointless condition ?

2025-01-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug other/89863] [meta-bug] Issues in gcc that other static analyzers (cppcheck, clang-static-analyzer, PVS-studio) find that gcc misses

2025-01-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863 Bug 89863 depends on bug 118333, which changed state. Bug 118333 Summary: gcc/config/i386/i386-expand.cc:24871: Pointless condition ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 What|Removed |Added -

[Bug target/115777] [12/13/14/15 regression] Severe performance regression on insertion sort at -O2 or above

2025-01-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115777 --- Comment #10 from Hongtao Liu --- > That's probably the conservative answer for BB vectorization, for loop vect > we know all those uses will be also in vector code. For BB vectorization > there is currently no easly reliable check to ensur

[Bug target/115777] [12/13/14/15 regression] Severe performance regression on insertion sort at -O2 or above

2025-01-09 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115777 --- Comment #8 from Hongtao Liu --- > in backend costing we do anticipate the vector construction to happen > by loading from memory though, so we don't account for the extra > GPR->xmm move penalty. Yes, I saw something similar before and had

[Bug target/118380] GCC is not optimizing computataion and code with avx intrinsics.

2025-01-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118380 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug target/118333] gcc/config/i386/i386-expand.cc:24871: Pointless condition ?

2025-01-07 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118333 --- Comment #2 from Hongtao Liu --- (In reply to Uroš Bizjak from comment #1) > (In reply to David Binderman from comment #0) > > Static analyser cppcheck says: > > > > gcc/config/i386/i386-expand.cc:24871:35: warning: Identical condition > > '

[Bug tree-optimization/118189] New: Weired vec_contruct of elements who's from continuous memory

2024-12-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118189 Bug ID: 118189 Summary: Weired vec_contruct of elements who's from continuous memory Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimizati

[Bug target/118017] [15 Regression] ICE: maximum number of generated reload insns per insn achieved (90) with -Og -frounding-math -mno-80387 -mno-mmx

2024-12-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118017 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug tree-optimization/118055] [15 Regression] gcc.dg/tree-ssa/pr83403-1.c and -2 for CRIS and m68k since r15-6097-gee2f19b0937b5e

2024-12-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118055 --- Comment #3 from Hongtao Liu --- > > Is it perhaps that the test is brittle; mostly target-specific despite being > at the tree-level and that instead the scan-test should be a specific > known-matching target list? The testcase is used to

[Bug tree-optimization/118055] [15 Regression] gcc.dg/tree-ssa/pr83403-1.c and -2 for CRIS and m68k since r15-6097-gee2f19b0937b5e

2024-12-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118055 --- Comment #1 from Hongtao Liu --- I explained in the thread. https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671289.html - BTW arm ci reported 2 regressed testcase so I added * gcc.dg/tree-ssa/pr83403-1.c: Add --param max-

[Bug c++/118021] New: [15 regression] ICE in parser

2024-12-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118021 Bug ID: 118021 Summary: [15 regression] ICE in parser Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assi

[Bug target/117874] [15 Regression] 17% regression for 433.milc on Zen4

2024-12-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117874 --- Comment #11 from Hongtao Liu --- (In reply to Richard Biener from comment #10) > The mult_su3_an part is now resolved. See PR117888 for the rest. Fixed by r15-6097-gee2f19b0937b5efc0b23c4319cbd4a38b27eac6e

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-10 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 Hongtao Liu changed: What|Removed |Added Assignee|liuhongt at gcc dot gnu.org|unassigned at gcc dot gnu.org

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #8 from Hongtao Liu --- > Why class is changed to INDEX_GPR16 for r273 Note with -mapxf, ICE disappears

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #7 from Hongtao Liu --- 5024 Choosing alt 6 in insn 295: (0) ?jc (1) Yd {*movti_internal} (sp_off=-128) 5025 Change to class INDEX_GPR16 for r273

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #6 from Hongtao Liu --- (In reply to Hongtao Liu from comment #5) > (In reply to Hongtao Liu from comment #4) > > The insn is generated by avoid_store_fowarding, and it is valid but failed > > reload > > Reload want to find a insn t

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #5 from Hongtao Liu --- (In reply to Hongtao Liu from comment #4) > The insn is generated by avoid_store_fowarding, and it is valid but failed > reload Reload want to find a insn to move data from GPR to SSE_REGS but *movti_internal

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 --- Comment #4 from Hongtao Liu --- The insn is generated by avoid_store_fowarding, and it is valid but failed reload 170Store forwarding detected: 171From: (insn 24 23 25 2 (set (mem/c:SI (pl

[Bug target/117946] ICE: maximum number of generated reload insns per insn achieved (90) with -O -favoid-store-forwarding -mavx10.1 -mprefer-avx128 --param=store-forwarding-max-distance=128

2024-12-08 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117946 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org

[Bug rtl-optimization/117890] Wrong code with -fvect-cost-model=unlimited

2024-12-03 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117890 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|liuhongt at gcc do

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 --- Comment #3 from Hongtao Liu --- (In reply to Richard Biener from comment #2) > The question is how we should define innermost - consider > > - loop interchange > - inlining of a function body with a loop into a loop > > the simplest appr

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 --- Comment #7 from Hongtao Liu --- (In reply to Hongtao Liu from comment #6) > (In reply to Jakub Jelinek from comment #5) > > So if anything, one would need to decide this on something larger rather > > than small testcases, say build the whol

[Bug tree-optimization/117888] cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 --- Comment #1 from Hongtao Liu --- This is the case which failed the recogonize innermost correctly. typedef unsigned short ggml_fp16_t; static float table_f32_f16[1 << 16]; inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) { un

[Bug tree-optimization/117888] New: cunrolli doesn't accurately remember what's "innermost"

2024-12-02 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117888 Bug ID: 117888 Summary: cunrolli doesn't accurately remember what's "innermost" Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimization

[Bug target/117860] GCC emits an unnecessary mov for x86 _addcarry/_subborrow intrinsic calls where the second operand is a constant that is within the range of a 32-bit integer

2024-12-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117860 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug target/117006] [15 regression] GCC trunk generates larger code than GCC 14 at -Os

2024-12-01 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117006 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org Las

[Bug target/117839] Redundant vector XOR instructions

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117839 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug target/80862] [x86] Wrong rounding results for some test cases

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80862 Bug 80862 depends on bug 73350, which changed state. Bug 73350 Summary: AVX512: GCC optimizes away rounding flags https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 What|Removed |Added --

[Bug target/73350] AVX512: GCC optimizes away rounding flags

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED CC|

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116675, which changed state. Bug 116675 Summary: No blend constant permute for V8HImode with just SSE2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 What|Removed |Added ---

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-28 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED

[Bug middle-end/117823] sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117823 --- Comment #3 from Hongtao Liu --- > Whether it needs -ffast-math depends on how it behaves with respect to > rounding I guess. If (float)bf16 * (float)bf16 + (float)bf16 * (float)bf16 > performs the float add without intermediate rounding for

[Bug middle-end/117823] sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117823 --- Comment #1 from Hongtao Liu --- The vectorization maybe need ffast-math.

[Bug middle-end/117823] New: sdot_prod pattern extended to floating point?

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117823 Bug ID: 117823 Summary: sdot_prod pattern extended to floating point? Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: mi

[Bug target/117734] Misses VNNI pmaddubsw qi->hi dot_prod

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117734 Hongtao Liu changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 --- Comment #7 from Hongtao Liu --- (In reply to Rainer Orth from comment #6) > The test is broken: > > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pand 4 > +UNRESOLVED: gcc.target/i386/pr116675.c scan-assembler-times pandn 4 >

[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438 --- Comment #8 from Hongtao Liu --- > > This might in the end be fallout of different sinking?! > > One difference wrt SLP vs. non-SLP is that with SLP we are taking the > initial value as the initial value with SLP while with non-SLP we > ar

[Bug tree-optimization/115438] [15 Regression] 503.bwaves_r regressed 5-11% on different x86_64 machines at -Ofast -march=native since r15-1006-gd93353e6423eca

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115438 --- Comment #7 from Hongtao Liu --- I only observed ~3% regression on ICX, the regressed one takes less instructions but more backend bounds, caused lower IPC and slow down performance.

[Bug target/117734] Misses VNNI pmaddubsw qi->hi dot_prod

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117734 --- Comment #1 from Hongtao Liu --- But there's a saturation inside pmaddubsw, not a simple dot_prod pattern.

[Bug target/117608] [15 Regression] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with __builtin_ia32_prefetch(0, 2, 0, 0) since r15-4833

2024-11-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117608 --- Comment #6 from Hongtao Liu --- (In reply to Jakub Jelinek from comment #4) > int i; > > void > foo (void) > { > __builtin_prefetch (&i, 2, 0); > } > > ICEs as well since that revision, and I think it actually ICEs on many > targets as w

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 116675, which changed state. Bug 116675 Summary: No blend constant permute for V8HImode with just SSE2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 What|Removed |Added ---

[Bug target/116675] No blend constant permute for V8HImode with just SSE2

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116675 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/113600] [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 113600, which changed state. Bug 113600 Summary: [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 What|Removed

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 113600, which changed state. Bug 113600 Summary: [14/15 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113600 What|Removed

[Bug target/117608] [15 Regression] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with __builtin_ia32_prefetch(0, 2, 0, 0)

2024-11-25 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117608 --- Comment #2 from Hongtao Liu --- @hulin please take a look.

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #10 from Hongtao Liu --- > > I do wonder about the usefulness of the memory alternative on the > sse_movhlps pattern though, there's the sse_storehps pattern which > also models the store part more precisely as V2SFmode. Is > sse_

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #8 from Hongtao Liu --- > vec_unpacks_hi_v4sf create an unintialized (reg:V4SF 853), I guess it may > confuse LRA to allocate a mem for it. For simple case void foo (double* a, float* b, int n) { for (int i = 0; i != n; i++)

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #7 from Hongtao Liu --- > Huh. It looks like this is from a V4SF -> 2xV2DF extension via > vec_unpack_{hi,lo}_expr. > > Originally this is > > (insn 1161 1160 1162 58 (set (reg:V4SF 853) > (vec_select:V4SF (vec_concat:V8S

[Bug middle-end/117542] Missed loop vectorization for truncate from float to __bf16.

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 --- Comment #5 from Hongtao Liu --- > Yes, something like this should work. I suggest to polish up a patch > with this also containing the backend pattern adjustments and post it > for review. The alternative is a convert optab for vec_pack_t

[Bug target/117697] gcc.target/i386/avx10_2-vmovd-1.c etc. FAIL

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117697 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug middle-end/117542] Missed loop vectorization for truncate from float to __bf16.

2024-11-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 --- Comment #3 from Hongtao Liu --- (In reply to Hongtao Liu from comment #2) > (In reply to Richard Biener from comment #1) > > It doesn't even unambiguously specify whether the mode is that of the source > > or the destination. The original i

[Bug target/117562] [15 Regression] 40% slowdown of 482.sphinx3 on Zen4, Zen5 since r15-5120-g9a62c149589103

2024-11-19 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117562 --- Comment #2 from Hongtao Liu --- My guess there's a lower-tripcount(< 128bit vector) hot loop, avx512_two_epilogues only takes more cmp/jcc instructions but doesn't execute any real vector instructions.

[Bug target/117418] ICE: in plus_constant, at explow.cc:102 with -mx32 -maddress-mode=long and __builtin_ia32_encodekey256_u32()

2024-11-17 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117418 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/117495] ICE: in extract_insn, at recog.cc:2882 (unrecognizable insn) with -ffast-math -mavx10.2-512 and __bf16 compare int

2024-11-13 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117495 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug middle-end/117542] Missed loop vectorization for truncate from float to __bf16.

2024-11-12 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 --- Comment #2 from Hongtao Liu --- (In reply to Richard Biener from comment #1) > It doesn't even unambiguously specify whether the mode is that of the source > or the destination. The original idea was of course that the size > unambiguously

[Bug middle-end/117542] New: Missed loop vectorization for truncate from float to __bf16.

2024-11-11 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117542 Bug ID: 117542 Summary: Missed loop vectorization for truncate from float to __bf16. Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: missed-optimizati

[Bug target/117304] ICE: in emit_move_insn, at expr.cc:4633 with -mavx10.1 and __builtin_ia32_cvtudq2ps512_mask()

2024-11-06 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117304 Hongtao Liu changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/117438] x86's pass_align_tight_loops may cause performance regression in nested loops

2024-11-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117438 --- Comment #5 from Hongtao Liu --- I reproduce with 30% regression on CLX, there's more frontend-bound with aligned case, it's uarch specific, will make it a uarch tune.

[Bug target/117416] [15 Regression] ICE: in gen_prefetch, at config/i386/i386.md:28541 with __builtin_ia32_prefetch() by r15-4833-ge9ab41b79933d4

2024-11-05 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117416 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED

[Bug target/117438] x86's pass_align_tight_loops may cause performance regression in nested loops

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117438 --- Comment #4 from Hongtao Liu --- (In reply to Mayshao-oc from comment #0) > Created attachment 59530 [details] > gcc -O1 loop.c > > Pass_align_tight_loops align the inner loop aggressively, this may cause > significant performance regression

[Bug target/117304] ICE: in emit_move_insn, at expr.cc:4633 with -mavx10.1 and __builtin_ia32_cvtudq2ps512_mask()

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117304 --- Comment #4 from Hongtao Liu --- $: grep AVX512F i386-builtin.def | grep -v EVEX512 | grep -e V8DI -e V8DF -e V16SI -e V16SF -e V32HI -e V32HF -e V32BF -e V64QI BDESC (OPTION_MASK_ISA_AVX512F, 0, CODE_FOR_unspec_fix_truncv8dfv8si2_mask_roun

[Bug target/117438] x86's pass_align_tight_loops may cause performance regression in nested loops

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117438 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug target/117416] [15 Regression] ICE: in gen_prefetch, at config/i386/i386.md:28541 with __builtin_ia32_prefetch() by r15-4833-ge9ab41b79933d4

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117416 Hongtao Liu changed: What|Removed |Added Resolution|DUPLICATE |--- Status|RESOLVED

[Bug bootstrap/117407] [15 regression] bootstrap fails after r15-4847-g79a75b1f551821

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117407 Hongtao Liu changed: What|Removed |Added CC||zsojka at seznam dot cz --- Comment #5 fr

[Bug target/117416] [15 Regression] ICE: in gen_prefetch, at config/i386/i386.md:28541 with __builtin_ia32_prefetch() by r15-4833-ge9ab41b79933d4

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117416 Hongtao Liu changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW

[Bug target/117416] [15 Regression] ICE: in gen_prefetch, at config/i386/i386.md:28541 with __builtin_ia32_prefetch() by r15-4833-ge9ab41b79933d4

2024-11-04 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117416 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #

[Bug tree-optimization/117323] GCC failed to optimize value / 128 to value >> 7 when the range of value must be positive

2024-10-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117323 --- Comment #6 from Hongtao Liu --- (In reply to Andrew Pinski from comment #5) > Note the reasoning for the difference in arguments between aarch64 and > x86_64 is that x86_64 defines PUSH_ARGS_REVERSED to be 1. Interesting define min/max as m

[Bug tree-optimization/117323] GCC failed to optimize value / 128 to value >> 7 when the range of value must be positive

2024-10-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117323 --- Comment #4 from Hongtao Liu --- Another miss optimization is GCC failed to recognize max_expr for sum1, which generates a lot pack/unpack code in the vectorizer prephitmp_66 = (int) _8; # DEBUG a => NULL # DEBUG b => NULL # DEBUG a

[Bug target/117318] ICE: in expand_simple_unop, at optabs.cc:2585 with __builtin_ia32_pmovusqb512mem_mask()

2024-10-30 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117318 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/117301] Many AVX10 tests fail

2024-10-29 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117301 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug middle-end/117323] New: GCC failed to optimize value / 128 to value >> 7 when the range of value must be positive

2024-10-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117323 Bug ID: 117323 Summary: GCC failed to optimize value / 128 to value >> 7 when the range of value must be positive Product: gcc Version: 15.0 Status: UNCONFIRMED

[Bug target/117318] ICE: in expand_simple_unop, at optabs.cc:2585 with __builtin_ia32_pmovusqb512mem_mask()

2024-10-27 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117318 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org E

[Bug target/117301] Many AVX10 tests fail

2024-10-26 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117301 --- Comment #3 from Hongtao Liu --- yes, new instructions are still under review for binutils and not landed on Binutil trunk, but GCC check_effective_target_avx10_2 target with "old" _mm256_mask_vpdpbssd_epi32. The problem should be gone when

[Bug target/117240] ICE: in copy_to_mode_reg, at explow.cc:657 with __builtin_ia32_vaesenc_v32qi() or __builtin_ia32_vaesenc_v64qi()

2024-10-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117240 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/101017] ICE: Segmentation fault, convert_memory_address_addr_space_1 with vector_size(32) and target_clone arch=core-avx2/default

2024-10-23 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101017 --- Comment #11 from Hongtao Liu --- (In reply to David Binderman from comment #10) > Did this ever happen ? > > Similar test case gcc/testsuite/gcc.target/i386/avx10_1-26.c > still seems to cause a crash: > > testsuite $ ~/gcc/results/bin/gcc

[Bug target/117232] EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF when using cmov

2024-10-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232 --- Comment #4 from Hongtao Liu --- (In reply to Andrew Pinski from comment #0) > This is expansion of PR 113609 which showed when I improved phiopt's factor > operations to handle more than just 1 operand operations. > > New reduced testcase t

[Bug target/117232] EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF when using cmov

2024-10-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/64700] Sink common code through PHI

2024-10-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64700 Bug 64700 depends on bug 117232, which changed state. Bug 117232 Summary: EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF when using cmov https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232

[Bug target/117240] ICE: in copy_to_mode_reg, at explow.cc:657 with __builtin_ia32_vaesenc_v32qi() or __builtin_ia32_vaesenc_v64qi()

2024-10-22 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117240 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org

[Bug target/117159] kmovw storing to memory is assumed to zero-extend

2024-10-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117159 Hongtao Liu changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|---

[Bug target/117232] EQ/NE comparison between avx512 kmask and -1 can be optimized with kxortest with checking CF when using cmov

2024-10-20 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117232 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org Las

[Bug target/116940] [15 Regression] wrong code with -O -mavx512vl and vector compare and negation since r15-1742

2024-10-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116940 Hongtao Liu changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-10-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 117072, which changed state. Bug 117072 Summary: [15 Regression] FAIL: gcc.target/i386/cond_op_fma_{float,double,_Float16}-1.c since r15-3509-gd34cda72098867 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072

[Bug target/117072] [15 Regression] FAIL: gcc.target/i386/cond_op_fma_{float,double,_Float16}-1.c since r15-3509-gd34cda72098867

2024-10-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117072 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug testsuite/115365] [15 regression] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-10-16 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365 Hongtao Liu changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|---

[Bug target/117159] kmovw storing to memory is assumed to zero-extend

2024-10-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117159 --- Comment #2 from Hongtao Liu --- typedef __attribute__((__vector_size__ (4))) unsigned char W; typedef __attribute__((__vector_size__ (64))) int V; typedef __attribute__((__vector_size__ (64))) long long Vq; W w; V v; Vq vq; static inline W

[Bug target/117159] kmovw storing to memory is assumed to zero-extend

2024-10-15 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117159 Hongtao Liu changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |liuhongt at gcc dot gnu.org

[Bug target/117082] [15 Regression] FAIL: gcc.target/i386/stack-check-17.c since r15-1619-g3b9b8d6cfdf593

2024-10-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117082 Hongtao Liu changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW

[Bug target/117081] [15 Regression] FAIL: gcc.target/i386/pr91384.c since r15-1619-g3b9b8d6cfdf593

2024-10-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117081 --- Comment #5 from Hongtao Liu --- *** Bug 117082 has been marked as a duplicate of this bug. ***

[Bug target/79786] [12/13/14/15 Regression] ICE tree check: expected class 'type', have 'declaration' (var_decl) in iamcu_alignment, at config/i386/i386.c:30263

2024-10-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79786 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #7

[Bug target/117116] [15 regression] error: unrecognizable insn: with -march=znver3

2024-10-14 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117116 Hongtao Liu changed: What|Removed |Added Assignee|liuhongt at gcc dot gnu.org|uros at gcc dot gnu.org --- Commen

  1   2   3   4   5   >