The generated code is now:
myswaps16:
        rlwinm 10,3,8,16,23
        rlwinm 9,3,24,24,31
        or 9,9,10
        extsh 3,9
        blr

myswapu16:
        rlwinm 10,3,8,16,23
        rlwinm 9,3,24,24,31
        or 9,9,10
        rlwinm 3,9,0,0xffff
        blr

While it was (without my patch):

myswaps16:
        slwi 9,3,8
        srawi 3,3,8
        or 3,9,3
        extsh 3,3
        blr

myswapu16:
        srwi 9,3,8
        rlwinm 3,3,8,16,23
        or 3,3,9
        blr

I don't know PowerPC, but I am not sure it's an improvement. Is it?

slwi and srwi are just extended mnemonics for the same rlwinm instruction,
so that's the same.  The last instruction in the new unsigned variant is
superfluous, since it is just setting the top bits to zero, and they
already are. rlwinm is ever so slightly better than srawi (in the signed version), because srawi sets the carry bit in addition to the GPR, so that
is an improvement.

But we can do this sequence in just two instructions:

        rlwimi 3,3,16,0,15
        rlwinm 3,3,8,16,31
        blr

so some more work is needed to make this optimal ;-)

Christophe, it looks like the zero-extend in the unsigned case is not
needed on any target?  Assuming the shifts are at least SImode, of
course (I'm too lazy to check, sorry).


Segher

Reply via email to