https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276

            Bug ID: 92276
           Summary: Embedded __attribute__ ((optimize("unroll-loops"))) is
                    not working together with '__attribute__
                    ((__always_inline__))'
           Product: gcc
           Version: 8.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Lijian.Zhang at arm dot com
  Target Milestone: ---

Dear experts,
I'm trying to use '__attribute__ ((optimize("unroll-loops")))' to apply
automatic loop unrolling to a static-line function with __attribute__
((__always_inline__)).
But the loop is not unrolled from the assembly output. The compiling command is
'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S'. 

However, if I apply -funroll-loops option to the compiling process, i.e.,
compile with command 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72
-funroll-loops unroll.c -S'. I can see loop is unrolled from the assembly
output.

And if I compile without -funroll-loops option, and if '__attribute__
((__always_inline__))' is commented out, '__attribute__ ((__always_inline__))'
is also taking effect.

So it seems those two attribute parameters are not working together, which
seems to be unreasonable to me. I want some functions to be inlined and also
the loops inside those functions unrolled automatically, as the loop iteration
number is fixed.

lijian@net-arm-d05-08:~/C/unroll$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

lijian@net-arm-d05-08:~/C/unroll$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

lijian@net-arm-d05-08:~/C/unroll$ gcc -march=armv8-a+crc -O2 -W -Wall
-mtune=cortex-a72 unroll.c -S

lijian@net-arm-d05-08:~/C/unroll$ lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):           2
NUMA node(s):        4
Vendor ID:           ARM
Model:               2
Model name:          Cortex-A72
Stepping:            r0p2
BogoMIPS:            100.00
L1d cache:           32K
L1i cache:           48K
L2 cache:            1024K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
NUMA node1 CPU(s):   16-31
NUMA node2 CPU(s):   32-47
NUMA node3 CPU(s):   48-63
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid


////////////////////
#include <stdio.h>
#include <stdlib.h>
#include <arm_acle.h>

static inline __attribute__ ((__always_inline__))
__attribute__ ((optimize("unroll-loops")))
unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len)
{
  for (; len >= 8; len -= 8, s += 8)
    v = __crc32cd (v, *((unsigned long *) s));

  for (; len >= 4; len -= 4, s += 4)
    v = __crc32cw (v, *((unsigned int *) s));

  for (; len >= 2; len -= 2, s += 2)
    v = __crc32ch (v, *((unsigned short *) s));

  for (; len >= 1; len -= 1, s += 1)
    v = __crc32cb (v, *((unsigned char *) s));

  return v;
}

int main (int argc, char *argv[])
{
    unsigned char s[40] = {argc, 0, argc, 0};
    unsigned char ss[32] = {argc, 0, argc, 0, argc, 0};
    unsigned int v = 0xbeefdead, vv = 0xdeadbeef;
    int len = strtol (argv[1], NULL, 10);

    for (int i = 0; i < len; i++) {
        v = clib_crc32c (v, s, 40);
        vv = clib_crc32c (vv, ss, 32);
    }

    printf ("%8X\n", v);
    printf ("%8X\n", vv);
    return 0;
}
////////////////////

Reply via email to