The regression was mostly due to a failed assumption by the test-case (a minimal solution must match the assembly code patterns), but also due to a different suboptimal sequence after the reload change.
The committed patch below fixes the regresssed code, the test-case, and tweaks the comment to match the current state of both the peephole2 use and gcc version. Tested cross to cris-elf. Also tested that the updated test-case matches the 4.7 output, in case the reload change is reverted. :) (I have no particular reason to believe that'd happen, though.) gcc/testsuite: PR target/53156 * gcc.target/cris/peep2-andu2.c: Tweak expected assembly code to match current output and cover new peephole2 pattern. gcc: PR target/53156 * config/cris/cris.md (andqu): New peephole2. (andu): Tweak head comment. Index: gcc/testsuite/gcc.target/cris/peep2-andu2.c =================================================================== --- gcc/testsuite/gcc.target/cris/peep2-andu2.c (revision 186934) +++ gcc/testsuite/gcc.target/cris/peep2-andu2.c (working copy) @@ -1,13 +1,20 @@ /* { dg-do assemble } */ -/* { dg-final { scan-assembler "movu.w \\\$r10,\\\$" } } */ -/* { dg-final { scan-assembler "and.w 2047,\\\$" } } */ +/* { dg-final { scan-assembler "movu.w \\\$r10,\\\$|movu.w 2047," } } */ +/* { dg-final { scan-assembler "and.w 2047,\\\$|and.d \\\$r10," } } */ /* { dg-final { scan-assembler-not "move.d \\\$r10,\\\$" } } */ -/* { dg-final { scan-assembler "movu.b \\\$r10,\\\$" } } */ -/* { dg-final { scan-assembler "and.b 95,\\\$" } } */ +/* { dg-final { scan-assembler "movu.b \\\$r10,\\\$|movu.b 95," } } */ +/* { dg-final { scan-assembler "and.b 95,\\\$|and.d \\\$r10," } } */ /* { dg-final { scan-assembler "andq -2,\\\$" } } */ +/* { dg-final { scan-assembler-not "movu.b 254,\\\$" } } */ /* { dg-options "-O2 -save-temps" } */ -/* Test the "andu" peephole2 trivially, register operand. */ +/* Originally used to test the "andu" peephole2 trivially, register operand. + Due to reload changes (r186861), the suboptimal sequence isn't + generated and the peephole2 doesn't trig for this trivial code + anymore. Another minimal sequence is generated, where the constant + is loaded to a free register first. Instead another case is exposed; + handled by the "andqu" peephole2, trigged by and_peep2_q (the andq + and scan-assembler-not-movu.b lines above). */ unsigned int and_peep2_hi (unsigned int y, unsigned int *x) Index: gcc/config/cris/cris.md =================================================================== --- gcc/config/cris/cris.md (revision 186934) +++ gcc/config/cris/cris.md (working copy) @@ -4936,17 +4936,17 @@ (define_peephole2 ; op3 (peephole casesi "operands[7] = rtx_equal_p (operands[3], operands[0]) ? operands[4] : operands[3];") -;; I cannot tell GCC (2.1, 2.7.2) how to correctly reload an instruction -;; that looks like -;; and.b some_byte,const,reg_32 -;; where reg_32 is the destination of the "three-address" code optimally. +;; There seems to be no other way to make GCC (including 4.8/trunk at +;; r186932) optimally reload an instruction that looks like +;; and.d reg_or_mem,const_32__65535,other_reg +;; where other_reg is the destination. ;; It should be: -;; movu.b some_byte,reg_32 -;; and.b const,reg_32 +;; movu.[bw] reg_or_mem,reg_32 +;; and.[bw] trunc_int_for_mode([bw], const_32__65535),reg_32 ;; or andq ;; but it turns into: -;; move.b some_byte,reg_32 -;; and.d const,reg_32 -;; Fix it here. +;; move.d reg_or_mem,reg_32 +;; and.d const_32__65535,reg_32 +;; Fix it with these two peephole2's. ;; Testcases: gcc.dg/cris-peep2-andu1.c gcc.dg/cris-peep2-andu2.c (define_peephole2 ; andu (casesi+45) @@ -4982,6 +4982,36 @@ (define_peephole2 ; andu (casesi+45) GEN_INT (trunc_int_for_mode (INTVAL (operands[3]), amode == SImode ? QImode : amode))); +}) + +;; Since r186861, gcc.dg/cris-peep2-andu2.c trigs this pattern, with which +;; we fix up e.g.: +;; movu.b 254,$r9. +;; and.d $r10,$r9 +;; into: +;; movu.b $r10,$r9 +;; andq -2,$r9. +;; Only do this for values fitting the quick immediate operand. +(define_peephole2 ; andqu (casesi+46) + [(set (match_operand:SI 0 "register_operand") + (match_operand:SI 1 "const_int_operand")) + (set (match_dup 0) + (and:SI (match_dup 0) (match_operand:SI 2 "nonimmediate_operand")))] + ;; Since the size of the memory access will be made different here, + ;; don't do this for a volatile access or a post-incremented address. + "satisfies_constraint_O (operands[1]) + && !side_effects_p (operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[2])" + [(set (match_dup 0) (match_dup 3)) + (set (match_dup 0) (and:SI (match_dup 0) (match_dup 4)))] +{ + enum machine_mode zmode = INTVAL (operands[2]) <= 255 ? QImode : HImode; + rtx op1 + = (REG_S_P (operands[2]) + ? gen_rtx_REG (zmode, REGNO (operands[2])) + : adjust_address (operands[2], zmode, 0)); + operands[3] = gen_rtx_ZERO_EXTEND (SImode, op1); + operands[4] = GEN_INT (trunc_int_for_mode (INTVAL (operands[1]), QImode)); }) ;; Try and avoid GOTPLT reads escaping a call: transform them into brgds, H-P