On 11/07/2011 03:28 PM, Walter Lee wrote: > gcc: > http://gcc.gnu.org/ml/gcc-patches/2011-10/msg02084.html
Well, I must say I'm a bit disappointed that the two ports are just dis-similar enough to not be merged. And failing that, I'd prefer to review them separately. Tilepro: > (UNSPEC_INSN_ADDB 1) > (UNSPEC_INSN_ADDBS_U 2) > (UNSPEC_INSN_ADDH 3) > (UNSPEC_INSN_ADDHS 4) > (UNSPEC_INSN_ADDIB 5) > (UNSPEC_INSN_ADDIH 6) All of these really ought to be re-written using proper rtl. For instance, > (define_insn "insn_addb" > [(set (match_operand:SI 0 "register_operand" "=r,r") > (unspec:SI [(match_operand:SI 1 "reg_or_0_operand" "%rO,rO") > (match_operand:SI 2 "reg_or_cint_operand" "N,rO")] > UNSPEC_INSN_ADDB))] > "" > "@ > addib\t%0, %r1, %j2 > addb\t%0, %r1, %r2" > [(set_attr "type" "X01,X01")]) ought to be (define_insn "addv4qi3" [(set (match_operand:V4QI 0 "register_operand" "=r,r") (plus:V4QI (match_operand:V4QI 1 "reg_or_0_operand" "%rO,rO") (match_operand:V4QI 2 "reg_or_cint_operand" "N,rO")))] ...) Similarly, addbs_u should be usaddv4qi3 using (us_plus:V4QI ...), addh should be (plus:V2HI ...), and so forth throughout most of the unspecs present in the file. As it is, you're making absolutely no use of the vectorizer. > (define_insn "*movqi_insn" > [(set (match_operand:QI 0 "nonimmediate_operand" "=r,r,r,r,r,m") > (match_operand:QI 1 "move_operand" "r,I,J,K,m,rO"))] > "(register_operand (operands[0], QImode) > || reg_or_0_operand (operands[1], QImode))" > "@ > move\t%0, %r1 > movei\t%0, %1 > moveli\t%0, %1 > auli\t%0, zero, %h1 Alternatives 1,2,3 are dead code -- there will never be a QImode constant in those ranges. Similarly for movhi_insn and alternative 3. > (define_insn_and_split "*movsi_big_immediate" > [(set (match_operand:SI 0 "register_operand" "=r") > (match_operand:SI 1 "immediate_operand" "i"))] > "! move_operand (operands[1], SImode)" > "#" > "&& 1" > [(const_int 0)] > { > tilepro_expand_set_const32 (operands[0], operands[1]); > DONE; > }) Why? Seems like the movsi expander would handle this. If you provide the pattern, it'll get used. Which will cause the rtl optimizers to undo this split any chance they get. > (define_expand "movdi" > [(set (match_operand:DI 0 "nonimmediate_operand" "") > (match_operand:DI 1 "general_operand" ""))] > "" > { > if (tilepro_expand_mov (DImode, operands)) > DONE; > }) > > (define_insn_and_split "*movdi_insn" > [(set (match_operand:DI 0 "nonimmediate_operand" "=r,r,m") > (match_operand:DI 1 "general_operand" "i,rO,rO"))] > "!MEM_P (operands[0]) || REG_P (operands[1])" > "#" > ; We need a temp reg for addr+4 for the store, so we can't do this late. > "!MEM_P (operands[0]) || can_create_pseudo_p()" > [(set (match_dup 2) (match_dup 3)) > (set (match_dup 4) (match_dup 5))] I strongly suggest that you drop this pattern and let the rtl optimizers split the multi-word move. > (define_insn "movstrictqi" > [(set (strict_low_part (match_operand:QI 0 "register_operand" "+r")) > (match_operand:QI 1 "reg_or_0_operand" "rO"))] > "" > "mm\t%r0, %r1, %r0, 0, 7" > [(set_attr "type" "X01")]) > > (define_insn "movstricthi" > [(set (strict_low_part (match_operand:HI 0 "register_operand" "+r")) > (match_operand:HI 1 "reg_or_0_operand" "rO"))] > "" > "mm\t%r0, %r1, %r0, 0, 15" > [(set_attr "type" "X01")]) Do you really get any advantage over a more canonical expansion of insv? Or, rather, an insv internal patterns such as (define_insn "*insv_0" [(set (zero_extract:SI (match_operand:SI 0 "register_operand" "+r") (const_int 2) (const_int 7)) (match_operand:SI 1 "register_operand" "r")))] "" "tblidxb0\t%0,%1 (define_insn "*insv_1" [(set (zero_extract:SI (match_operand:SI 0 "register_operand" "+r") (const_int 0) (match_operand:SI 1 "u5bit_cint_operand" "")) (match_operand:SI 2 "register_operand" "r")))] "" "mm\t%0,%0,%2,0,%1") (define_insn "*insv_2" [(set (zero_extract:SI (match_operand:SI 0 "register_operand" "+r") (match_operand:SI 1 "u5bit_cint_operand" "") (match_operand:SI 2 "u5bit_cint_operand" "")) (zero_extract:SI (match_operand:SI 3 "register_operand" "r") (match_dup 1) (match_dup 2)))] "" { operands[2] = GEN_INT (INTVAL (operands[1]) + INTVAL (operands[2])); return "mm\t%0,%0,%3,%1,%2"; }) You'll probably want to keep your existing "*mm" pattern, but these are things that the combine pass is likely to generate. > (define_insn_and_split "*addsi3_big_immediate" > [(set (match_operand:SI 0 "register_operand" "=r") > (plus:SI (match_operand:SI 1 "reg_or_0_operand" "%rO") > (match_operand:SI 2 "const_int_operand" "i")))] > "! add_operand (operands[2], SImode)" > "#" > "&& 1" > [(const_int 0)] > { > bool ok = tilepro_expand_addsi (operands[0], operands[1], operands[2]); > gcc_assert (ok); > DONE; > }) Similar question to movsi_big_immediate. > ;; Change a rotate right to a rotate left, since that's all we have. > (define_expand "rotrsi3" > [(set (match_operand:SI 0 "register_operand" "") > (rotate:SI (match_operand:SI 1 "reg_or_0_operand" "") > (match_operand:SI 2 "reg_or_u5bit_operand" "")))] > "" > " > if (CONST_INT_P (operands[2])) > { > operands[2] = GEN_INT ((-INTVAL (operands[2])) & 31); > } > else > { > /* No need to subtract from 32 since count taken mod 32 anyway. */ > rtx reg = gen_reg_rtx (SImode); > emit_insn (gen_negsi2 (reg, operands[2])); > operands[2] = reg; > } > ") optabs.c will do exactly this if the pattern is not present. Similarly for the DImode logicals (popcount, parity, etc). > (define_insn "*mnz_insn" > (define_insn "*mnz_insn_reverse" > (define_insn "*mz_insn" > (define_insn "*mz_insn_reverse" > (define_insn "mvnz_insn" > (define_insn "*mvnz_insn_reverse" All of these ought to be compressed into a single pattern. Otherwise reload won't get a chance to select the proper alternative all the time. (define_predicate "eqne_operator" (match_code "eq,ne")) (define_insn "*movcc" [(set (match_operand:SI 0 "register_operand" "=r, r, r, r") (if_then_else:SI (match_operator 4 "eqne_operator" [(match_operand:SI 1 "reg_or_0_operand" "rO,rO,rO,rO") (const_int 0)]) (match_operand:SI 2 "reg_or_0_operand" "rO, O,rO, 0") (match_operand:SI 3 "reg_or_0_operand" " O,rO, 0,rO")))] "" "@ m%c4 %0, %r1, %r2 m%C4 %0, %r1, %3 mv%c4 %0, %r1, %r2 mv%C4 %0, %r1, %r3 [(set_attr "type" "*,*,Y0,Y0")]) And print_operand handling %c as printing a condition (z, nz) and %C printing its inverse. > (define_insn "zero_extendqihi2" Should not be needed. > (define_insn "*lh" > (define_insn_and_split "*extendhisi2" These need to be combined in order for reload to do the right thing. Suppose some HImode variable is spilled to the stack. If you split the patterns, reload will perform a load then split to the two shifts. If you combine the patterns, reload will notice memory is an alternative and perform the LH right away. Similarly for extendqihi2. > ;; Divide stubs. These exist to work around a bug in expmed.c, which > ;; will not attempt to convert a divide by constant into a multiply > ;; unless there is a pattern for a divide of the same mode. If this is still present in mainline, please file a bug. > (define_insn "_return" > [(return) > (use (reg:SI 55))] > "reload_completed" > "jrp\tlr" > [(set_attr "type" "X1")]) Post merge, consider changing this to simple_return to enable shrink wrapping. This may also involve epilogue unwind fixups though. > (define_insn "add_small_got" > [(set (match_operand:SI 0 "register_operand" "=r") > (lo_sum:SI (match_operand:SI 1 "reg_or_0_operand" "rO") > (unspec:SI [(match_operand:SI 2 "symbolic_operand" "")] > UNSPEC_GOT_SYM)))] I strongly suggest that you wrap these unspecs in CONST, and handle them as appropriate in the previous HIGH/LO_SUM patterns, and in the print_operand cases. This goes for the TLS patterns as well. I'm surprised you don't support post_{inc,dec,modify} addressing modes, via the "add" set of memory instructions. Do they not really perform well on the architecture, or did the gcc version you started with not generate them well? I also don't see support for AND addresses, as for lw_na. And yet you seem to be using those addressing modes in tilepro_expand_unaligned_load. I can only assume these are failing validate_mem, and forcing the address into a register first? > tilepro_allocate_stack (rtx op0, rtx op1) Invalid rtl sharing in this function. > /* Mark all insns we just emitted as frame-related. */ > for (; last_insn != NULL_RTX; last_insn = next_insn (last_insn)) > RTX_FRAME_RELATED_P (last_insn) = 1; You can't do just this for the epilogue. You have to emit REG_CFA_RESTORE notes. I'd prefer to see all new ports use _only_ REG_CFA_* notes, and avoid REG_FRAME_RELATED_EXPR. I won't force this though. > /* We generate PC relative SYMBOL_REFs as an optimization, to avoid > going through the GOT when the symbol is local to the compilation > unit. But such a symbol requires that the common text_label that > we generate at the beginning of the function be in the same section > as the reference to the SYMBOL_REF. This may not be true if we > generate hot/cold sections. This function looks for such cases and > replaces such references with the longer sequence going through the GOT. Why not use gp-relative references? A small matter of extending the assembler with new relocations, perhaps. > fprintf (file, "\tlw r11, r10\n"); > fprintf (file, "\taddi r10, r10, 4\n"); > fprintf (file, "\tlw r10, r10\n"); I guess going back to the previous question, not lwadd r11, r10, 4 lw r10, r10 ? r~