http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089
--- Comment #25 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-07 23:31:20 UTC --- Created attachment 28633 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28633 Arithmetic right shift rework 2 This could be an alternative approach for the arith right shifts. Applied to rev 193240: CSiBE -m4-single -O2 -ml -mpretend-cmove sum: 3159747 -> 3160047 +300 / +0.009494 % CSiBE -m2 -O2 -ml sum: 3314907 -> 3313319 -1588 / -0.047905 % The idea here is to let arith shifts initially go through a slightly more complex pattern 'ashrsi3_bridge_const', which allows the lib function address or shad constant to be CSE'ed early on. As mentioned in comment #18, the arith -> logical shift conversion is done by combine only under certain circumstances. After some trial and error I arrived at the 'ashrsi3_bridge_const' pattern, which makes combine convert most of the arith shifts into logical shifts. In some cases it also finds more opportunities, but regressions like this one still remain: void MC_put_xy_16_c (uint8_t * dest, const uint8_t * ref, const int stride, int height) { // must see a shlr instead of 2x shar. // somehow the loop prevents this... do { dest[0] = (((ref[0]+ref[0 +1]+(ref+stride)[0]+(ref+stride)[0 +1]+2)>>2)); ref += stride; dest += stride; } while (--height); } In addition to that, some unlucky register allocations appear, which cause unneeded reg-reg copies (it seems something has trouble utilizing commutative insns such as 'add'). The other problem is that patterns such as: (define_insn_and_split "*lshrsi3" [(set (match_operand:SI 0 "arith_reg_dest") (lshiftrt:SI (match_operand:SI 1 "arith_reg_operand") (match_operand:SI 2 "shift_count_operand"))) (clobber (reg:SI T_REG)) (use (match_operand:SI 3 "general_operand")) (use (match_operand:SI 4 "general_operand"))] "TARGET_SH1 && can_create_pseudo_p () && CONST_INT_P (operands[2])" "#" "&& 1" [(const_int 0)] { emit_insn (gen_lshrsi3 (operands[0], operands[1], operands[2])); DONE; }) have to be added for combine to convert more arith -> logical shifts. In this patch I've added only one such pattern. However, if for example zero_extract patterns are added, variations such as above would be required, too. This looks discomforting to me. Maybe the better solution would indeed be to add a arith -> logical shift conversion pass before combine, or try to convert arith shifts in the split pass after combine by looking at the data flow of the shifted values.