On 4/29/23 10:24, Roger Sayle wrote:

This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
   return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
         asr r2,#4
         and r2,#15
         mov.w r6,#-256
         and r6,r7
         or r2,r6
         shl r7,#4
         and r7,#255
         or r2,r7
         ret

with this patch, we now generate:
foo:    swpn r2
         ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.

The naming of the new code iterators is taken from i386.md.
The any_rotate code iterator is used in my next (split out) patch.

This patch has been tested by building a cross-compiler to xstormy16-elf
from x86_64-pc-linux-gnu and confirming the new test cases pass.
Ok for mainline?


2023-04-29  Roger Sayle  <ro...@nextmovesoftware.com>

gcc/ChangeLog
         * config/stormy16/stormy16.md (any_lshift): New code iterator.
         (any_or_plus): Likewise.
         (any_rotate): Likewise.
         (*<any_lshift>_and_internal): New define_insn_and_split to
         recognize a logical shift followed by an AND, and split it
         again after reload.
         (*swpn): New define_insn matching xstormy16's swpn.
         (*swpn_zext): New define_insn recognizing swpn followed by
         zero_extendqihi2, i.e. with the high byte set to zero.
         (*swpn_sext): Likewise, for swpn followed by cbw.
         (*swpn_sext_2): Likewise, for an alternate RTL form.
         (*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
         sequence is split in the correct place to recognize the *swpn_zext
         followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
         * gcc.target/xstormy16/swpn-1.c: New QImode test case.
         * gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
         * gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
         * gcc.target/xstormy16/swpn-4.c: New HImode test case.
Ah, bridge patterns.

OK for the trunk.

jeff

Reply via email to