Re: Reorder/combine insns on superscalar arch

Jeff Law Thu, 14 Jan 2016 20:56:35 -0800

On 01/14/2016 04:47 PM, Igor Shevlyakov wrote:

Guys,


I'm trying to make compiler to generate better code on superscalar
in-order machine but can't find the right way to do it.

Imagine the following code:

long f(long* p, long a, long b)
{
   long a1 = a << 2;
   long a2 = a1 + b;
   return p[a1] + p[a2];
}


by default compiler generates something like this in some pseudo-asm:

         shl     r3, r3, 2
         add     r4, r3, r4
         ld8    r15, [r2 + r3 * 8]
         ld8    r2, [ r2 + r4 * 8]
        {    add     r2, r2, r15  ;    ret    }

but it would be way better this way:

   {   sh_add  r4, r4, (r3 << 2)   ; shl   r3, r3, 2  }
   {   ld8    r15, [r2 + r3 * 8]   ;   ld8    r2, [ r2 + r4 * 8] }
   {    add     r2, r2, r15  ;    ret    }


2nd sequence is 2 cycles shorter. Combiner pass even shows patterns
like this but fail to transform this as it wrapped in parallel:

Failed to match this instruction:
(parallel [
         (set (reg:DI 56)
             (plus:DI (mult:DI (reg:DI 3 r3 [ a ])
                     (const_int 4 [0x4]))
                 (reg:DI 4 r4 [ b ])))
         (set (reg/v:DI 40 [ a1 ])
             (ashift:DI (reg:DI 3 r3 [ a ])
                 (const_int 2 [0x2])))
     ])

You can always write a pattern which matches the PARALLEL. You can theneither arrange to emit the assembly code from that pattern or split thepattern (after reload/lra)


What would be a proper way to perform reorganizations like this in general way?

The same goes with the pointer increment:

add r2, r2, 1
ld r3, [r2+0]

would be much better off like this:

{ ld r3, [r2 + 1] ; add r2, r2, 1 }

Are those kind of things overlooked or I failed to set something in
machine-dependent portion?

Similarly. You may also get some mileage from LEGITIMIZE_ADDRESS,though it may not see the add/load together which would hinder itsability to generate the code you want.

Note that using a define_split to match these things prior to reloadlikely won't work because combine will likely see the split pattern asbeing the same cost as the original insns.

In general the combiner really isn't concerned with superscalar issues,though you can tackle some superscalar things with creative patternsthat match parallels or which match something more complex, but thensplit it up later.

Note that GCC supports a number of superscalar architectures -- theywere very common for workstations and high end embedded processors formany years. MIPS, PPC, HPPA, even x86, etc all have variants which weretuned for superscalar code generation. I'm sure there's tricks you canexploit in every one of those architectures to help generate code withfewer data dependencies and thus more opportunities to exploit thesuperscalar nature of your processor.




jeff

Re: Reorder/combine insns on superscalar arch

Reply via email to