https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118360
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Georg-Johann Lay <g...@gcc.gnu.org>: https://gcc.gnu.org/g:0bb3223097e5ced4f9a13d18c6c65f2a9496437e commit r15-7164-g0bb3223097e5ced4f9a13d18c6c65f2a9496437e Author: Georg-Johann Lay <a...@gjlay.de> Date: Sat Jan 11 14:10:29 2025 +0100 AVR: PR118012 - Try to work around sick code from match.pd. This patch tries to work around PR118012 which may use a full fledged multiplication instead of a simple bit test. This is because match.pd's /* (zero_one == 0) ? y : z <op> y -> ((typeof(y))zero_one * z) <op> y */ /* (zero_one != 0) ? z <op> y : y -> ((typeof(y))zero_one * z) <op> y */ "optimizes" code with op in { plus, ior, xor } like if (a & 1) b = b <op> c; to something like: x1 = EXTRACT_BIT0 (a); x2 = c MULT x1; b = b <op> x2; or x1 = EXTRACT_BIT0 (a); x2 = ZERO_EXTEND (x1); x3 = NEG x2; x4 = a AND x3: b = b <op> x4; which is very expensive and may even result in a libgcc call for a 32-bit multiplication on devices that don't even have MUL. Notice that EXTRACT_BIT0 is already more expensive (slower, more code, more register pressure) than a bit-test + branch. The patch: o Adds some combiner patterns that try to map sick code back to a bit test + branch. o Adjusts costs to make MULT (x AND 1) cheap, in the hope that the middle-end will use that alternative (which we map to sane code). o On devices without MUL, 32-bit multiplication was performed by a library call, which bypasses the MULT (x AND 1) and similar patterns. Therefore, mulsi3 is also allowed for devices without MUL so that we get at MULT pattern that can be transformed. (Though this is not possible on AVR_TINY since it passes arguments on the stack). o Add a new command line option -mpr118012, so most of the patterns and cost computations can be switched off as they have avropt_pr118012 in their insn condition. o Added sign-extract.0 patterns unconditionally (no avropt_pr118012). Notice that this patch is just a work-around, it's not a fix of the root cause, which are the patterns in match.pd that don't care about the target and don't even care about costs. The work-around is incomplete, and 3 of the new tests are still failing. This is because there are situations where it does not work: * The MULT is realized as a library call. * The MULT is realized as an ASHIFT, and the ASHIFT again is transformed into something else. For example, with -O2 -mmcu=atmega128, ASHIFT(3) is transformed into ASHIFT(1) + ASHIFT(2). PR tree-optimization/118012 PR tree-optimization/118360 gcc/ * config/avr/avr.opt (-mpr118012): New undocumented option. * config/avr/avr-protos.h (avr_out_sextr) (avr_emit_skip_pixop, avr_emit_skip_clear): New protos. * config/avr/avr.cc (avr_adjust_insn_length) [case ADJUST_LEN_SEXTR]: Handle case. (avr_rtx_costs_1) [NEG]: Costs for NEG (ZERO_EXTEND (ZERO_EXTRACT)). [MULT && avropt_pr118012]: Costs for MULT (x AND 1). (avr_out_sextr, avr_emit_skip_pixop, avr_emit_skip_clear): New functions. * config/avr/avr.md [avropt_pr118012]: Add combine patterns with that condition that try to work around PR118012. (adjust_len) <sextr>: Add insn attr value. (pixop): New code iterator. (mulsi3) [avropt_pr118012 && !AVR_TINY]: Allow these in insn condition. gcc/testsuite/ * gcc.target/avr/mmcu/pr118012-1.h: New file. * gcc.target/avr/mmcu/pr118012-1-o2-m128.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-m128.c: New test. * gcc.target/avr/mmcu/pr118012-1-o2-m103.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-m103.c: New test. * gcc.target/avr/mmcu/pr118012-1-o2-t40.c: New test. * gcc.target/avr/mmcu/pr118012-1-os-t40.c: New test. * gcc.target/avr/mmcu/pr118360-1.h: New file. * gcc.target/avr/mmcu/pr118360-1-o2-m128.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-m128.c: New test. * gcc.target/avr/mmcu/pr118360-1-o2-m103.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-m103.c: New test. * gcc.target/avr/mmcu/pr118360-1-o2-t40.c: New test. * gcc.target/avr/mmcu/pr118360-1-os-t40.c: New test.