2010/4/6, Jim Wilson <wil...@codesourcery.com>:
> On 04/06/2010 02:24 AM, roy rosen wrote:
> > (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0)
> >         (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0)
> >             (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil))
> >
>
> Only subregs are decomposed.  So use vec_select instead of subreg.  I see
> you already have a vec_concat to combine the two v2hi into one v4hi, so
> there is no need for the subreg in the dest.  You should try eliminating
> that first and see if that helps.  If that isn't enough, then replace the
> subregs in the source with vec_select operations.
>
> Jim
>

Thanks Jim,

I have implemented your suggestion and now I am using vec_select and
the subreg optimization does not decomopose the instruction.
The problem now is that I get stuck with redundent instructions (that
I translate to move insns).
For example:

(insn 37 32 38 7 a.c:25 (set (reg:V2HI 116)
        (vec_concat:V2HI (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 0 [0x0])
                    ]))
            (vec_select:HI (reg:V4HI 112)
                (parallel [
                        (const_int 1 [0x1])
                    ])))) 121 {v4hi_extract_low_v2hi}
(expr_list:REG_DEAD (reg:V4HI 112)
        (nil)))

This instruction eventually has to be optimized out somehow. It is
dealing with extracting V2HI from V4HI. V4HI is stored in a register
pair (like r0:r1) and V2HI would simply mean to take one of these
registers - this does not need an instruction.

I saw in arm/neon.md that they have a similar problem:

; FIXME: We wouldn't need the following insns if we could write subregs of
; vector registers. Make an attempt at removing unnecessary moves, though
; we're really at the mercy of the register allocator.

(define_insn "move_lo_quad_v4si"
  [(set (match_operand:V4SI 0 "s_register_operand" "+w")
        (vec_concat:V4SI
          (match_operand:V2SI 1 "s_register_operand" "w")
          (vec_select:V2SI (match_dup 0)
                           (parallel [(const_int 2) (const_int 3)]))))]
  "TARGET_NEON"
{
  int dest = REGNO (operands[0]);
  int src = REGNO (operands[1]);

  if (dest != src)
    return "vmov\t%e0, %P1";
  else
    return "";
}
  [(set_attr "neon_type" "neon_bp_simple")]
)

Their solution is also not complete.
What is the proper way to handle such a case and how do I let gcc know
that this is a simple move instruction so that gcc would be able to
optimize it out?

Thanks, Roy.

Reply via email to