[Bug target/91796] Sub-optimal YMM register allocation.

2021-08-19 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #10

[Bug target/91796] Sub-optimal YMM register allocation.

2021-08-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #9 from Andrew Pinski --- In GCC 5-8 we produced: vpcmpeqd%ymm2, %ymm2, %ymm2 vpsllq $63, %ymm2, %ymm2 vandnpd %ymm1, %ymm2, %ymm1 vandpd %ymm2, %ymm0, %ymm0 vorpd %ymm1, %ymm0, %ymm

[Bug target/91796] Sub-optimal YMM register allocation.

2020-05-23 Thread maxim.yegorushkin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #8 from Maxim Egorushkin --- Another example https://stackoverflow.com/questions/61975526/gcc-optimization-better-at-o0-than-o3

[Bug target/91796] Sub-optimal YMM register allocation.

2020-01-29 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-11 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #7 from Marc Glisse --- (In reply to Maxim Egorushkin from comment #3) > It seems to me that register allocation has been a weak spot in gcc for > years. Most such testcases show issues with arguments/return in very small functions,

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #6 from Jakub Jelinek --- And as for the constant, seems ICC also emits just constant load from memory instead of trying two instructions instead and clang, while it uses broadcast to save .rodata, doesn't use two instructions either:

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #5 from Jakub Jelinek --- Wasn't the whole point of Segher's combiner changes not to propagate hard registers into instructions to leave the RA more in control? Propagating something in some other pass would undo that change.

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-10 Thread hjl.tools at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #4 from H.J. Lu --- Since fwprop.c has static rtx propagate_rtx (rtx x, machine_mode mode, rtx old_rtx, rtx new_rtx, bool speed) { rtx tem; bool collapsed; int flags; if (REG_P (new_rtx) && REGNO (new_rtx) < F

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-10 Thread maxim.yegorushkin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #3 from Maxim Egorushkin --- It seems to me that register allocation has been a weak spot in gcc for years. gcc often allocates registers in such a way that extra register moves are necessary, compared to competition, like in this p

[Bug target/91796] Sub-optimal YMM register allocation.

2019-10-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 Jakub Jelinek changed: What|Removed |Added CC||hjl.tools at gmail dot com,

[Bug target/91796] Sub-optimal YMM register allocation.

2019-09-20 Thread maxim.yegorushkin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91796 --- Comment #1 from Maxim Egorushkin --- In addition, the code tries to generate avx_signbit using 2 instructions: comparision vpcmpeqq and shift vpsllq to avoid loading anything from memory. However, the compiler replaces the code with loading a