The generated code is now:
myswaps16:
rlwinm 10,3,8,16,23
rlwinm 9,3,24,24,31
or 9,9,10
extsh 3,9
blr
myswapu16:
rlwinm 10,3,8,16,23
rlwinm 9,3,24,24,31
or 9,9,10
rlwinm 3,9,0,0xffff
blr
While it was (without my patch):
myswaps16:
slwi 9,3,8
srawi 3,3,8
or 3,9,3
extsh 3,3
blr
myswapu16:
srwi 9,3,8
rlwinm 3,3,8,16,23
or 3,3,9
blr
I don't know PowerPC, but I am not sure it's an improvement. Is it?
slwi and srwi are just extended mnemonics for the same rlwinm
instruction,
so that's the same. The last instruction in the new unsigned variant is
superfluous, since it is just setting the top bits to zero, and they
already are. rlwinm is ever so slightly better than srawi (in the
signed
version), because srawi sets the carry bit in addition to the GPR, so
that
is an improvement.
But we can do this sequence in just two instructions:
rlwimi 3,3,16,0,15
rlwinm 3,3,8,16,31
blr
so some more work is needed to make this optimal ;-)
Christophe, it looks like the zero-extend in the unsigned case is not
needed on any target? Assuming the shifts are at least SImode, of
course (I'm too lazy to check, sorry).
Segher