https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
--- Comment #12 from Maxim Egorushkin ---
gcc-13 and gcc-14 no longer align the last byte of a loop to the last byte of a
L1i-cache-line, when compiled with `-march=native -mtune=native` on Zen3 and
Zen4 CPUs.
I remember gcc-11 or gcc-12 aligni
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
Maxim Egorushkin changed:
What|Removed |Added
CC||maxim.yegorushkin at gmail dot
com
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #16 from Maxim Egorushkin ---
(In reply to Maxim Egorushkin from comment #14)
> (In reply to Andrew Pinski from comment #6)
>
> > It happens more often with vector instructions/registers due to the
> > different "modes" of the regis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #15 from Maxim Egorushkin ---
(In reply to Maxim Egorushkin from comment #14)
> (In reply to Andrew Pinski from comment #6)
>
> > It happens more often with vector instructions/registers due to the
> > different "modes" of the regis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #14 from Maxim Egorushkin ---
(In reply to Andrew Pinski from comment #6)
> It happens more often with vector instructions/registers due to the
> different "modes" of the registers that it can hold (subregs).
That's right, my empir
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #13 from Maxim Egorushkin ---
(In reply to Andrew Pinski from comment #11)
> Let me try again:
>
> So we have:
> __v4di v4 = ymm0
> __v2di tmp = _mm256_extracti128_si256(v4, 1); // vextracti128
> __v2di tmp1 = _mm256_castsi256_si128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #10 from Maxim Egorushkin ---
(In reply to Andrew Pinski from comment #9)
> (In reply to Maxim Egorushkin from comment #8)
> > (In reply to Andrew Pinski from comment #6)
> > > If you look at the difference between the 2 functions.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #8 from Maxim Egorushkin ---
(In reply to Andrew Pinski from comment #6)
> If you look at the difference between the 2 functions.
> vextracti128xmm1, ymm0, 0x1
>
> vs
> vmovdqa xmm1, xmm0
> vextracti128
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #5 from Maxim Egorushkin ---
(In reply to Andrew Pinski from comment #2)
> Register allocation is NP complete problem after all.
vmovdqa instruction probably intends to turn a ymm register into a xmm register
by zeroing all the high
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
--- Comment #4 from Maxim Egorushkin ---
To add more context, I use Mula's AVX2 popcount function from
https://arxiv.org/abs/1611.07612
It produces 4 counts in a v4di register which should be summed into a scalar
total. Which brought me here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118984
Bug ID: 118984
Summary: Unnecessary instructions are emitted when addition
terms are in an unfortunate order
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
S
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899
--- Comment #15 from Maxim Egorushkin ---
(In reply to Maxim Egorushkin from comment #14)
> (In reply to Marco Elver from comment #0)
> > On X86-64 the callee preserves all general purpose registers, except for
> > R11. R11 can be used as a scra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899
Maxim Egorushkin changed:
What|Removed |Added
CC||maxim.yegorushkin at gmail dot
com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764
--- Comment #8 from Maxim Egorushkin ---
It was supposed to be one comment, but I kept clicking "save changes" button
because it provided no visual feedback that the comment was being posted.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764
--- Comment #7 from Maxim Egorushkin ---
> Can you provide the preprocessed source? Since I can't seem to reproduce it
> with the above.
It should be compiled with "-pthread -std=gnu++14 -O3 -Wall -Wextra -Werror" to
trigger the warning/error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764
--- Comment #6 from Maxim Egorushkin ---
> Can you provide the preprocessed source? Since I can't seem to reproduce it
> with the above.
It should be compiled with "-pthread -std=gnu++14 -O3 -Wall -Wextra -Werror" to
trigger the warning/error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764
--- Comment #4 from Maxim Egorushkin ---
Full context: https://github.com/max0x7ba/atomic_queue/issues/55
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110764
Maxim Egorushkin changed:
What|Removed |Added
CC||maxim.yegorushkin at gmail dot
com
18 matches
Mail list logo