On 5/15/19 8:30 AM, Robin Dapp wrote: >> It would really help if you could provide testcases which show the >> suboptimal code and any analysis you've done. > > I tried introducing a define_subst pattern that substitutes something > one of two other subst patterns already changed. > > The first subst pattern helps remove a superfluous and on the shift > count operand by accepting both variants, with and without an and for > the insn pattern. > > (define_subst "masked_op_subst" > [(set (match_operand:DSI 0 "" "") > (shift:DSI (match_operand:DSI 1 "" "") > (match_operand:SI 2 "" "")))] > "" > [(set (match_dup 0) > (shift:DSI (match_dup 1) > (and:SI (match_dup 2) > (match_operand:SI 3 "const_int_6bitset_operand" "jm6"))))]) > > The second subst helps encode a shift count addition like $r1 + 1 as > address style operand 1($r1) that is directly supported by the shift > instruction. > > (define_subst "addr_style_op_subst" > [(set (match_operand:DSI_VI 0 "" "") > (shift:DSI_VI (match_operand:DSI_VI 1 "" "") > (match_operand:SI 2 "" "")))] > "" > [(set (match_dup 0) > (shift:DSI_VI (match_dup 1) > (plus:SI (match_operand:SI 2 "register_operand" "a") > (match_operand 3 "const_int_operand" "n"))))]) > > Both of these are also used in combination. > > Now, in order to get rid of the subregs in the pattern combine creates, > I would need to be able to do something like > > (define_subst "subreg_subst" > [(set (match_operand:DI 0 "" "") > (shift:DI (match_operand:DI 1 "" "") > (subreg:SI (match_dup:DI 2)))] > > where the (match_dup:DI 2) would capture both (and:SI ...) [with the > first argument being either a register or an already substituted > (plus:SI ...)] as well as a simple (plus:SI ...). > > As far as I can tell match_dup:mode can be used to change the mode of > the top-level operation but the operands will remain the same. For > this, a match_dup_deep or whatever would be useful. I'm pretty sure we > don't want to open this can of worms, though :) > > To get rid of this, I explicitly duplicated all three subst combinations > resulting in a lot of additional code. This is not necessary when the > subregs are eliminated by the middle end via SHIFT_COUNT_TRUNCATED. > Maybe there is a much more obvious way that I missed? Painful. I doubt exposing the masking during the RTL expansion phase and hoping the standard optimizers will eliminate it would work better -- though perhaps if the expanders queried the global range information and elided the masking when the range of the shift was known to be in range.
jeff