[Bug target/111930] New: aarch64: SME is still not supported.

2023-10-23 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111930 Bug ID: 111930 Summary: aarch64: SME is still not supported. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target

[Bug target/111354] [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.

2023-09-08 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354 --- Comment #3 from d_vampile --- (In reply to Andrew Pinski from comment #1) > First off the performance is difference is die to micro-arch issues with > unaligned stores of 256 bits. > > Also iirc rte_mov128blocks is tuned at copying blocks

[Bug target/111354] New: [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases.

2023-09-08 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354 Bug ID: 111354 Summary: [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases. Product: gcc Version: 10.3.0 Status: UNCONFI

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 --- Comment #9 from d_vampile --- (In reply to Andrew Pinski from comment #8) > (In reply to d_vampile from comment #7) > > In terms of runtime, this code is the best. > > Depends on the core > What does -mtune=native provide for the core

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 --- Comment #7 from d_vampile --- (In reply to Andrew Pinski from comment #3) > GCC 11+ produces: > .L3: > vmovdqu (%rsi), %ymm2 > vmovdqu 32(%rsi), %ymm1 > subq$-128, %rdi > subq$-128, %rsi > vmov

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 --- Comment #6 from d_vampile --- GCC 7.3.0 produces: extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_loadu_si256 (__m256i_u const *__P) { return *__P; 401170: c5 fa 6f 1e v

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 --- Comment #5 from d_vampile --- According to the analysis, the following two prs may cause the preceding problems: PR1:https://github.com/gcc-mirror/gcc/commit/dd9b529f08c3c6064c37234922d298336d78caf7 PR2:https://github.com/gcc-mirror/gcc/comm

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 --- Comment #2 from d_vampile --- gcc7.3.0 program use vmovups and vmovups instructions , but gcc10.3.0 program only use vmovups instructions.In addition, the order of the two assembly instructions is not consistent.

[Bug target/111332] Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 d_vampile changed: What|Removed |Added CC||d_vampile at 163 dot com --- Comment #1 fro

[Bug target/111332] New: Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious.

2023-09-07 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332 Bug ID: 111332 Summary: Using GCC7.3.0 and GCC10.3.0 to compile the same test case, assembler file instructions are different and performance fallback is obvious. Product:

[Bug target/110059] When SPEC is used to test the GCC (10.3.1), the test result of subitem 548 fluctuates abnormally.

2023-06-04 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059 d_vampile changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/110059] When SPEC is used to test the GCC (10.3.1), the test result of subitem 548 fluctuates abnormally.

2023-05-31 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059 d_vampile changed: What|Removed |Added Version|10.3.1 |10.3.0 Target|

[Bug target/110059] New: When SPEC is used to test the GCC (10.3.1), the test result of subitem 548 fluctuates abnormally.

2023-05-31 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059 Bug ID: 110059 Summary: When SPEC is used to test the GCC (10.3.1), the test result of subitem 548 fluctuates abnormally. Product: gcc Version: 10.3.1 Status: UNCONFIRME

[Bug target/110023] 10% performance drop on important benchmark after r247544.

2023-05-30 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110023 --- Comment #2 from d_vampile --- (In reply to Andrew Pinski from comment #1) > This is almost definitely an aarch64 cost model issue ... Do you mean that the vectorized cost_model of the underlying hardware causes the policy of not peeling the

[Bug target/110026] [Bug] 5% performance drop on important benchmark after r260951.

2023-05-30 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026 --- Comment #2 from d_vampile --- (In reply to Jakub Jelinek from comment #1) > Note, any benchmarking for speed with -O rather than -O2/-O3 is > intentionally missing various optimizations which can greatly improve > performance. O0 does miss

[Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024 --- Comment #3 from d_vampile --- (In reply to Andrew Pinski from comment #2) > Which core is showing the difference here? > Because some cores I know of, loading/storing using the FP registers is > actually one cycle slower than using GPRs. Yes

[Bug tree-optimization/110026] New: [Bug] 5% performance drop on important benchmark after r260951.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026 Bug ID: 110026 Summary: [Bug] 5% performance drop on important benchmark after r260951. Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal

[Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024 d_vampile changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug target/110024] [Bug] 5% performance drop on important benchmark after r260951.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024 --- Comment #1 from d_vampile --- It can be seen that the vector register (D0) is used before the modification, and the common register (X0) is used after the modification.

[Bug target/110024] New: [Bug] 5% performance drop on important benchmark after r260951.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024 Bug ID: 110024 Summary: [Bug] 5% performance drop on important benchmark after r260951. Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/110023] New: [10.3 Regression] 10% performance drop on important benchmark after r247544.

2023-05-29 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110023 Bug ID: 110023 Summary: [10.3 Regression] 10% performance drop on important benchmark after r247544. Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity:

[Bug tree-optimization/91246] vectorization failure for a small loop to search array element

2022-03-14 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246 d_vampile changed: What|Removed |Added CC||d_vampile at 163 dot com --- Comment #6 from

[Bug tree-optimization/88398] vectorization failure for a small loop to do byte comparison

2022-03-14 Thread d_vampile at 163 dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398 d_vampile changed: What|Removed |Added CC||d_vampile at 163 dot com --- Comment #48 fro