Bug#785475: arm64 shift+rotate optimization bug

Magnus Holmgren Sat, 16 May 2015 11:51:37 -0700

Package: gcc-4.9
Version: 4.9.2-16

I think I may have discovered an optimizer bug that results in incorrect code 
when building nettle. The Camellia cipher contains code similar to the 
following, which reproduces the bug:


#include <stdint.h>
#define ROTL32(n,x) (((x)<<(n)) | ((x)>>(-(n)&31)))
#define ROTR32(n,x) (((x)>>(n)) | ((x)<<(-(n)&31)))

uint64_t func(uint64_t x1, uint64_t x2) {

  uint32_t dw;

  dw = (x1 & x2) >> 32; x1 ^= ROTL32(1, dw);
  return x1;
}

The above results in the following machine code with -O1 or greater:

   0:   8a010001        and     x1, x0, x1
   4:   9361fc21        asr     x1, x1, #33
   8:   2a0103e1        mov     w1, w1
   c:   ca000020        eor     x0, x1, x0
  10:   d65f03c0        ret

which would be correct, I believe, if we substitute ROTR32 for ROTL32.
Note that if we use dw, for example by printing it, 

4.9.2-10 produces the correct result:

   0:   8a010001        and     x1, x0, x1
   4:   d360fc21        lsr     x1, x1, #32
   8:   13817c21        ror     w1, w1, #31
   c:   ca000020        eor     x0, x1, x0
  10:   d65f03c0        ret

Moreover, the following inline function

static inline uint32_t rotl32 (int n, uint32_t x)
{
  return (x << n) | (x >> (-n & 31));
}

results in equivalent incorrect, but much more compact, machine code:

   0:   8a010001        and     x1, x0, x1
   4:   ca818400        eor     x0, x0, x1, asr #33
   8:   d65f03c0        ret

-- 
Magnus Holmgren        holmg...@debian.org
Debian Developer

signature.asc
Description: This is a digitally signed message part.

Bug#785475: arm64 shift+rotate optimization bug

Reply via email to