2010/4/6, Jim Wilson <wil...@codesourcery.com>: > On 04/06/2010 02:24 AM, roy rosen wrote: > > (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0) > > (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0) > > (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil)) > > > > Only subregs are decomposed. So use vec_select instead of subreg. I see > you already have a vec_concat to combine the two v2hi into one v4hi, so > there is no need for the subreg in the dest. You should try eliminating > that first and see if that helps. If that isn't enough, then replace the > subregs in the source with vec_select operations. > > Jim >
Thanks Jim, I have implemented your suggestion and now I am using vec_select and the subreg optimization does not decomopose the instruction. The problem now is that I get stuck with redundent instructions (that I translate to move insns). For example: (insn 37 32 38 7 a.c:25 (set (reg:V2HI 116) (vec_concat:V2HI (vec_select:HI (reg:V4HI 112) (parallel [ (const_int 0 [0x0]) ])) (vec_select:HI (reg:V4HI 112) (parallel [ (const_int 1 [0x1]) ])))) 121 {v4hi_extract_low_v2hi} (expr_list:REG_DEAD (reg:V4HI 112) (nil))) This instruction eventually has to be optimized out somehow. It is dealing with extracting V2HI from V4HI. V4HI is stored in a register pair (like r0:r1) and V2HI would simply mean to take one of these registers - this does not need an instruction. I saw in arm/neon.md that they have a similar problem: ; FIXME: We wouldn't need the following insns if we could write subregs of ; vector registers. Make an attempt at removing unnecessary moves, though ; we're really at the mercy of the register allocator. (define_insn "move_lo_quad_v4si" [(set (match_operand:V4SI 0 "s_register_operand" "+w") (vec_concat:V4SI (match_operand:V2SI 1 "s_register_operand" "w") (vec_select:V2SI (match_dup 0) (parallel [(const_int 2) (const_int 3)]))))] "TARGET_NEON" { int dest = REGNO (operands[0]); int src = REGNO (operands[1]); if (dest != src) return "vmov\t%e0, %P1"; else return ""; } [(set_attr "neon_type" "neon_bp_simple")] ) Their solution is also not complete. What is the proper way to handle such a case and how do I let gcc know that this is a simple move instruction so that gcc would be able to optimize it out? Thanks, Roy.