On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, reg + test reg, reg. I don't know exact reason why gcc do this. For latest x86 processors, ciscization should help processor frontend also codesize, for processor backend, they should be the same(has same uops). So the patch deleted the peephole2, and also modify another splitter to generate more cmp mem, 0 for 32-bit target. It will help instruction fetch. for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no comparison to 1 or -1, so adjust the testcase since under 32-bit target, we now generate cmp mem, 0 instead of load + test. Similar for pr78035.c. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} No performance impact for SPEC2017 on ICX/Znver3. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (*<code><mode>3_1): Replace register_operand with nonimmediate_operand for operand 1. Also force_reg it when mode is QImode. (define_peephole2): Deleted related peephole2. gcc/testsuite/ChangeLog: * gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1 or -1, also don't scan-assembler test for ia32. * gcc.target/i386/minmax-10.c: Ditto. * gcc.target/i386/minmax-2.c: Ditto. * gcc.target/i386/pr78035.c: Ditto. * gcc.target/i386/pr96861.c: Scan either cmp or test 3 times.
It was almost certainly for PPro/P2 given it was rth's work from 1999. Probably should have been conditionalized on PPro/P2 at the time. No worries losing it now...
Jeff