I have a fictitious machine which has a word size of 8-bits but can handle 16-bit adds and 16-bit mov's. I am trying to build the most efficient support for handling an addsi3 insn. My problem is that if I try to split up the addsi3 insn into a couple of addhi3 insns (using a define_expand template) the compiler appears to ignore this declaration and proceeds to implement addsi3 as a bunch of addqi's along with some carry propogation rtx's. i.e. the compiler defaults to the word size of the machine and I can't seem to override this. I could allow it to go and create its big long list of addqi's etc and then use some insn combining method such as a peephole optimizer but this seems really inefficient to me - especially when I can explicitly state how a larger insn should be split.
If I use the following addsi3 template: (define_insn "addsi3" [(set (match_operand:SI 0 "general_operand" "=g") (plus:SI (match_operand:SI 1 "general_operand" "g") (match_operand:SI 2 "general_operand" "g")))] "" "addsi3 %1 %2 %0 ;(%1 plus %2)->%0" ) I can observe addsi being used in the assembly output of my test case. If I use: (define_expand "addsi" [(set (match_operand:SI 0 "general_operand" "=g") (plus:SI (match_operand:SI 1 "general_operand" "g") (match_operand:SI 2 "general_operand" "g")))] "" "{ emit_insn (gen_addhi3 (custom_subword(operands[0], 0, SImode), custom_subword(operands[1], 0, SImode), custom_subword(operands[2], 0, SImode))); emit_insn (gen_addhi3 (custom_subword(operands[0], 1, SImode), custom_subword(operands[1], 1, SImode), custom_subword(operands[2], 1, SImode))); DONE; }" ) the output becomes a mess of addqi, cmpqi, and branches. Any help would be great. Thanks Marty