http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089



--- Comment #25 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-11-07 
23:31:20 UTC ---

Created attachment 28633

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28633

Arithmetic right shift rework 2



This could be an alternative approach for the arith right shifts.

Applied to rev 193240:



CSiBE -m4-single -O2 -ml -mpretend-cmove

sum:  3159747 -> 3160047    +300 / +0.009494 %



CSiBE -m2 -O2 -ml

sum:  3314907 -> 3313319    -1588 / -0.047905 %



The idea here is to let arith shifts initially go through a slightly more

complex pattern 'ashrsi3_bridge_const', which allows the lib function address

or shad constant to be CSE'ed early on.

As mentioned in comment #18, the arith -> logical shift conversion is done by

combine only under certain circumstances.  After some trial and error I arrived

at the 'ashrsi3_bridge_const' pattern, which makes combine convert most of the

arith shifts into logical shifts.  In some cases it also finds more

opportunities, but regressions like this one still remain:



void MC_put_xy_16_c

(uint8_t * dest, const uint8_t * ref, const int stride, int height)

{

  // must see a shlr instead of 2x shar.

  // somehow the loop prevents this...

  do

  {

    dest[0] = (((ref[0]+ref[0 +1]+(ref+stride)[0]+(ref+stride)[0 +1]+2)>>2));

    ref += stride; dest += stride;

  } while (--height);

}



In addition to that, some unlucky register allocations appear, which cause

unneeded reg-reg copies (it seems something has trouble utilizing commutative

insns such as 'add').



The other problem is that patterns such as:



(define_insn_and_split "*lshrsi3"

  [(set (match_operand:SI 0 "arith_reg_dest")

    (lshiftrt:SI (match_operand:SI 1 "arith_reg_operand")

             (match_operand:SI 2 "shift_count_operand")))

   (clobber (reg:SI T_REG))

   (use (match_operand:SI 3 "general_operand"))

   (use (match_operand:SI 4 "general_operand"))]

  "TARGET_SH1 && can_create_pseudo_p () && CONST_INT_P (operands[2])"

  "#"

  "&& 1"

  [(const_int 0)]

{

  emit_insn (gen_lshrsi3 (operands[0], operands[1], operands[2]));

  DONE;

})



have to be added for combine to convert more arith -> logical shifts.  In this

patch I've added only one such pattern.  However, if for example zero_extract

patterns are added, variations such as above would be required, too.  This

looks discomforting to me.  Maybe the better solution would indeed be to add a

arith -> logical shift conversion pass before combine, or try to convert arith

shifts in the split pass after combine by looking at the data flow of the

shifted values.

Reply via email to