Re: [PATCH] teach emit_store_flag to use clz/ctz

Maciej W. Rozycki Sat, 12 May 2012 11:36:22 -0700

On Sun, 6 May 2012, Andrew Pinski wrote:

> >  For the record: MIPS processors that implement CLZ/CLO (for some reason
> > CTZ/CTO haven't been added to the architecture, but these operations can
> > be cheaply transformed into CLZ/CLO) generally have a dedicated unit that
> > causes no pipeline stall for these instructions even in the simplest
> > pipeline designs like the M4K -- IOW they are issued at the usual one
> > instruction per pipeline clock rate.
> 
> Even on Octeon this is true.  Though Octeon has seq/sneq too so it
> does not matter in the end.


 Does Octeon's pipeline qualify as simple?  For some reason I've thought 
it is a high-performance core.  The M4K is one of the smallest/simplest 
MIPS chips ever built.

 And actually all MIPS processors (back to 1985's MIPS I ISA) support 
two-instruction set-if-equal and set-if-not-equal sequences:

        xor     rd, rt, rs
        sltiu   rd, rd, 1

and:

        xor     rd, rt, rs
        sltu    rd, zero, rd

respectively, that may still be more beneficial than any possible 
alternatives, especially ones involving branches.

> Note I originally was the one who proposed this optimization for
> PowerPC even before I saw what XLC did.  See PR 10588 (which I filed 9
> years ago)  and it seems we are about to fix it soon.

 For that -- set-if-zero and set-if-non-zero -- you can use the 
instructions as above (that are supported by all MIPS processors):

        sltiu   rd, rs, 1

and

        sltu    rd, zero, rs

However GCC doesn't seem smart enough to use them well with your example.  
I'd expect something like:

        sltiu   $4, $4, 1
        sltiu   $2, $5, 1
        jr      $31
         or     $2, $4, $2

however I get:

        beq     $4, $0, .L3
         nop
        jr      $31
         sltiu  $2, $5, 1
.L3:
        jr      $31
         li     $2, 1

which is never faster and obviously not smaller either.  And there is 
really no need to avoid the second comparison as per logical OR rules here 
-- it's all in registers.

 This pessimisation is avoided for MIPS IV and more recent processors that 
have move-if-non-zero however (and the second comparison is always 
evaluated):

        sltiu   $5, $5, 1
        li      $2, 1
        jr      $31
         movn   $2, $5, $4

Any chance to get it better with the fix you've mentioned?

  Maciej

Re: [PATCH] teach emit_store_flag to use clz/ctz

Reply via email to