So what is happening here is that after r15-268-g9dbff9c05520a7, a move instruction still exists after combine and the register allocator choses different register allocation order for the xor and because the input operand of lzcntq is not the same as output operand, there is an extra xor that happens (due to an errata).
This fixes the testcase by using loading from a pointer instead of a function argument directly. The register allocator has more freedom since the load has no hard register associated with it (rdi) so it can be in eax register right away. Tested for both -m32 and -m64 on x86_64-linux-gnu. gcc/testsuite/ChangeLog: PR testsuite/115028 * gcc.target/i386/pr101950-2.c: Signed-off-by: Andrew Pinski <quic_apin...@quicinc.com> --- gcc/testsuite/gcc.target/i386/pr101950-2.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/pr101950-2.c b/gcc/testsuite/gcc.target/i386/pr101950-2.c index 896f1b46414..ccc361e3a46 100644 --- a/gcc/testsuite/gcc.target/i386/pr101950-2.c +++ b/gcc/testsuite/gcc.target/i386/pr101950-2.c @@ -6,14 +6,19 @@ /* { dg-final { scan-assembler-times "\txor\[ql]\t" 2 } } */ /* { dg-final { scan-assembler-times "\tsar\[ql]\t|\tcltd" 2 } } */ +/* Use pointers to avoid register allocation difference due to argument + and return register being different and the difference in selecting eax + for one the result of the xor vs selecting rdi due to the order of the + shift vs the not shift. */ + int -foo (long x) +foo (long *x) { - return __builtin_clrsbl (x); + return __builtin_clrsbl (*x); } int -bar (int x) +bar (int *x) { - return __builtin_clrsb (x); + return __builtin_clrsb (*x); } -- 2.43.0