Hi, I have encountered several problems with lower subreg optimization in my port. In some cases I noticed that insns are decomposed in subreg1 pass and do not get recomposed later which causes at the end using two insns instead of one.
For example I have the following dump before subreg1 (note 30 93 31 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 31 30 32 7 a.c:25 (set (reg:V4HI 112) (mem:V4HI (reg/f:SI 98 [ __vect_p_41 ]) [2 S8 A64])) 115 {*movv4hi_load} (nil)) (insn 32 31 33 7 a.c:25 (set (reg:V4HI 113) (mem:V4HI (reg/f:SI 99 [ __vect_p_36 ]) [2 S8 A64])) 115 {*movv4hi_load} (nil)) (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0) (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0) (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil)) (insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4) (plus:V2HI (subreg:V2HI (reg:V4HI 112) 4) (subreg:V2HI (reg:V4HI 113) 4))) 118 {addv2hi3} (nil)) (insn 35 34 36 7 a.c:25 (set (reg:V4HI 114) (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0) (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi} (expr_list:REG_EQUAL (plus:V4HI (reg:V4HI 112) (reg:V4HI 113)) (nil))) (insn 36 35 37 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64]) (reg:V4HI 114)) 116 {*movv4hi_store} (nil)) which turns into: (note 30 93 94 7 [bb 7] NOTE_INSN_BASIC_BLOCK) (insn 94 30 95 7 a.c:25 (set (reg:SI 142) (mem:SI (reg/f:SI 98 [ __vect_p_41 ]) [2 S4 A64])) 62 {movsi_load} (nil)) (insn 95 94 96 7 a.c:25 (set (reg:SI 143 [+4 ]) (mem:SI (plus:SI (reg/f:SI 98 [ __vect_p_41 ]) (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil)) (insn 96 95 97 7 a.c:25 (set (reg:SI 144) (mem:SI (reg/f:SI 99 [ __vect_p_36 ]) [2 S4 A64])) 62 {movsi_load} (nil)) (insn 97 96 33 7 a.c:25 (set (reg:SI 145 [+4 ]) (mem:SI (plus:SI (reg/f:SI 99 [ __vect_p_36 ]) (const_int 4 [0x4])) [2 S4 A32])) 62 {movsi_load} (nil)) (insn 33 97 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0) (plus:V2HI (subreg:V2HI (reg:SI 142) 0) (subreg:V2HI (reg:SI 144) 0))) 118 {addv2hi3} (nil)) (insn 34 33 35 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 4) (plus:V2HI (subreg:V2HI (reg:SI 143 [+4 ]) 0) (subreg:V2HI (reg:SI 145 [+4 ]) 0))) 118 {addv2hi3} (nil)) (insn 35 34 36 7 a.c:25 (set (reg:V4HI 114) (vec_concat:V4HI (subreg:V2HI (reg:V4HI 114) 0) (subreg:V2HI (reg:V4HI 114) 4))) 119 {concat_v2hi_to_v4hi} (nil)) (insn 36 35 98 7 a.c:25 (set (mem:V4HI (reg/f:SI 97 [ __vect_p_47 ]) [2 S8 A64]) (reg:V4HI 114)) 116 {*movv4hi_store} (nil)) notice that now the loads are being done in SI mode which is twice expensive than in V4HI mode. Can someone please help with that? Should this code be decomposed and then composed (which it doesn't) or should it not be decoposed at the first place. What should I change in order to get at the end a load for v4hi. Thanks, Roy.