https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92276
Bug ID: 92276 Summary: Embedded __attribute__ ((optimize("unroll-loops"))) is not working together with '__attribute__ ((__always_inline__))' Product: gcc Version: 8.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: Lijian.Zhang at arm dot com Target Milestone: --- Dear experts, I'm trying to use '__attribute__ ((optimize("unroll-loops")))' to apply automatic loop unrolling to a static-line function with __attribute__ ((__always_inline__)). But the loop is not unrolled from the assembly output. The compiling command is 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S'. However, if I apply -funroll-loops option to the compiling process, i.e., compile with command 'gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 -funroll-loops unroll.c -S'. I can see loop is unrolled from the assembly output. And if I compile without -funroll-loops option, and if '__attribute__ ((__always_inline__))' is commented out, '__attribute__ ((__always_inline__))' is also taking effect. So it seems those two attribute parameters are not working together, which seems to be unreasonable to me. I want some functions to be inlined and also the loops inside those functions unrolled automatically, as the loop iteration number is fixed. lijian@net-arm-d05-08:~/C/unroll$ gcc --version gcc (Ubuntu 8.3.0-6ubuntu1~18.04.1) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. lijian@net-arm-d05-08:~/C/unroll$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic lijian@net-arm-d05-08:~/C/unroll$ gcc -march=armv8-a+crc -O2 -W -Wall -mtune=cortex-a72 unroll.c -S lijian@net-arm-d05-08:~/C/unroll$ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 32 Socket(s): 2 NUMA node(s): 4 Vendor ID: ARM Model: 2 Model name: Cortex-A72 Stepping: r0p2 BogoMIPS: 100.00 L1d cache: 32K L1i cache: 48K L2 cache: 1024K L3 cache: 16384K NUMA node0 CPU(s): 0-15 NUMA node1 CPU(s): 16-31 NUMA node2 CPU(s): 32-47 NUMA node3 CPU(s): 48-63 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid //////////////////// #include <stdio.h> #include <stdlib.h> #include <arm_acle.h> static inline __attribute__ ((__always_inline__)) __attribute__ ((optimize("unroll-loops"))) unsigned int clib_crc32c (unsigned int v, unsigned char * s, int len) { for (; len >= 8; len -= 8, s += 8) v = __crc32cd (v, *((unsigned long *) s)); for (; len >= 4; len -= 4, s += 4) v = __crc32cw (v, *((unsigned int *) s)); for (; len >= 2; len -= 2, s += 2) v = __crc32ch (v, *((unsigned short *) s)); for (; len >= 1; len -= 1, s += 1) v = __crc32cb (v, *((unsigned char *) s)); return v; } int main (int argc, char *argv[]) { unsigned char s[40] = {argc, 0, argc, 0}; unsigned char ss[32] = {argc, 0, argc, 0, argc, 0}; unsigned int v = 0xbeefdead, vv = 0xdeadbeef; int len = strtol (argv[1], NULL, 10); for (int i = 0; i < len; i++) { v = clib_crc32c (v, s, 40); vv = clib_crc32c (vv, ss, 32); } printf ("%8X\n", v); printf ("%8X\n", vv); return 0; } ////////////////////