I was hoping I could ask an ARM backend maintainer to look over the
following patch.

I was examining the code generated for the following C snippet on a
raspberry pi,

static inline int popcount_lut8(unsigned *buf, int n)
{
  int cnt=0;
  unsigned int i;
  do {
    i = *buf;
    cnt += lut[i&255];
    cnt += lut[i>>8&255];
    cnt += lut[i>>16&255];
    cnt += lut[i>>24];
    buf++;
  } while(--n);
  return cnt;
}

and was surprised to see following instruction sequence generated by the
compiler:

mov    r5, r2, lsr #8
uxtb   r5, r5

This sequence can be performed by a single ARM instruction:

               uxtb       r5, r2, ror #8

The attached patch allows GCC's combine pass to take advantage of the ARM's
uxtb with
rotate functionality to implement the above zero_extract, and likewise to
use the sxtb
with rotate to implement sign_extract.  ARM's uxtb and sxtb can only be used
with rotates
of 0, 8, 16 and 24, and of these only the 8 and 16 are useful [ror #0 is a
nop, and extends
with ror #24 can be implemented using regular shifts],  so the approach here
is to add the
six missing but useful instructions as 6 different define_insn in arm.md,
rather than try to
be clever with new predicates.

Alas, later ARM hardware has advanced bit field instructions, and earlier
ARM cores 
didn't support extend-with-rotate, so this appears to only benefit armv6 era
CPUs.

The following patch has been minimally tested by building cc1 of a
cross-compiler 
and confirming the desired instructions appear in the assembly output for
the test
case.  Alas, my minimal raspberry pi hardware is unlikely to be able to
bootstrap gcc
or run the testsuite, so I'm hoping a ARM expert can check (and confirm)
whether this
change is safe and suitable.  [Thanks in advance and apologies for any
inconvenience].


2018-01-14  Roger Sayle  <ro...@nextmovesoftware.com>

        * config/arm/arm.md (*arm_zeroextractsi2_8_8,
*arm_signextractsi2_8_8,
        *arm_zeroextractsi2_8_16, *arm_signextractsi2_8_16,
        *arm_zeroextractsi2_16_8, *arm_signextractsi2_16_8): New.

2018-01-14  Roger Sayle  <ro...@nextmovesoftware.com>

        * gcc.target/arm/extend-ror.c: New test.


Cheers,
Roger
--
Roger Sayle, PhD.
NextMove Software Limited
Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY


Attachment: arm_zext.log
Description: Binary data

Attachment: arm_zext.patch
Description: Binary data

/* { dg-do compile } */
/* { dg-options "-O -march=armv6" } */
/* { dg-prune-output "switch .* conflicts with" } */

unsigned int zeroextractsi2_8_8(unsigned int x)
{
  return (unsigned char)(x>>8);
}

unsigned int zeroextractsi2_8_16(unsigned int x)
{
  return (unsigned char)(x>>16);
}

unsigned int signextractsi2_8_8(unsigned int x)
{
  return (int)(signed char)(x>>8);
}

unsigned int signextractsi2_8_16(unsigned int x)
{
  return (int)(signed char)(x>>16);
}

unsigned int zeroextractsi2_16_8(unsigned int x)
{
  return (unsigned short)(x>>8);
}

unsigned int signextractsi2_16_8(unsigned int x)
{
  return (int)(short)(x>>8);
}

/* { dg-final { scan-assembler-times ", ror #8" 4 } } */
/* { dg-final { scan-assembler-times ", ror #16" 2 } } */

Reply via email to