https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111930
Bug ID: 111930
Summary: aarch64: SME is still not supported.
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
--- Comment #3 from d_vampile ---
(In reply to Andrew Pinski from comment #1)
> First off the performance is difference is die to micro-arch issues with
> unaligned stores of 256 bits.
>
> Also iirc rte_mov128blocks is tuned at copying blocks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
Bug ID: 111354
Summary: [7/10/12 regression] The instructions of the DPDK demo
program are different and run time increases.
Product: gcc
Version: 10.3.0
Status: UNCONFI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
--- Comment #9 from d_vampile ---
(In reply to Andrew Pinski from comment #8)
> (In reply to d_vampile from comment #7)
> > In terms of runtime, this code is the best.
>
> Depends on the core
> What does -mtune=native provide for the core
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
--- Comment #7 from d_vampile ---
(In reply to Andrew Pinski from comment #3)
> GCC 11+ produces:
> .L3:
> vmovdqu (%rsi), %ymm2
> vmovdqu 32(%rsi), %ymm1
> subq$-128, %rdi
> subq$-128, %rsi
> vmov
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
--- Comment #6 from d_vampile ---
GCC 7.3.0 produces:
extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm256_loadu_si256 (__m256i_u const *__P)
{
return *__P;
401170: c5 fa 6f 1e v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
--- Comment #5 from d_vampile ---
According to the analysis, the following two prs may cause the preceding
problems:
PR1:https://github.com/gcc-mirror/gcc/commit/dd9b529f08c3c6064c37234922d298336d78caf7
PR2:https://github.com/gcc-mirror/gcc/comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
--- Comment #2 from d_vampile ---
gcc7.3.0 program use vmovups and vmovups instructions , but gcc10.3.0 program
only use vmovups instructions.In addition, the order of the two assembly
instructions is not consistent.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
d_vampile changed:
What|Removed |Added
CC||d_vampile at 163 dot com
--- Comment #1 fro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111332
Bug ID: 111332
Summary: Using GCC7.3.0 and GCC10.3.0 to compile the same test
case, assembler file instructions are different and
performance fallback is obvious.
Product:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059
d_vampile changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059
d_vampile changed:
What|Removed |Added
Version|10.3.1 |10.3.0
Target|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059
Bug ID: 110059
Summary: When SPEC is used to test the GCC (10.3.1), the test
result of subitem 548 fluctuates abnormally.
Product: gcc
Version: 10.3.1
Status: UNCONFIRME
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110023
--- Comment #2 from d_vampile ---
(In reply to Andrew Pinski from comment #1)
> This is almost definitely an aarch64 cost model issue ...
Do you mean that the vectorized cost_model of the underlying hardware causes
the policy of not peeling the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
--- Comment #2 from d_vampile ---
(In reply to Jakub Jelinek from comment #1)
> Note, any benchmarking for speed with -O rather than -O2/-O3 is
> intentionally missing various optimizations which can greatly improve
> performance.
O0 does miss
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
--- Comment #3 from d_vampile ---
(In reply to Andrew Pinski from comment #2)
> Which core is showing the difference here?
> Because some cores I know of, loading/storing using the FP registers is
> actually one cycle slower than using GPRs.
Yes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110026
Bug ID: 110026
Summary: [Bug] 5% performance drop on important benchmark after
r260951.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
d_vampile changed:
What|Removed |Added
Status|WAITING |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
--- Comment #1 from d_vampile ---
It can be seen that the vector register (D0) is used before the modification,
and the common register (X0) is used after the modification.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110024
Bug ID: 110024
Summary: [Bug] 5% performance drop on important benchmark after
r260951.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110023
Bug ID: 110023
Summary: [10.3 Regression] 10% performance drop on important
benchmark after r247544.
Product: gcc
Version: 10.3.0
Status: UNCONFIRMED
Severity:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91246
d_vampile changed:
What|Removed |Added
CC||d_vampile at 163 dot com
--- Comment #6 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88398
d_vampile changed:
What|Removed |Added
CC||d_vampile at 163 dot com
--- Comment #48 fro
23 matches
Mail list logo