On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc <gcc@gcc.gnu.org> wrote:
>
> GCC compiler is able to understand if the prefix of an array holds
> constant/static data and apply compiler optimizations on that partial
> constant part of the array, however, it seems that it is not leveraging
> this information in all cases.
>
> On understanding the behavior of compiler optimization for partially
> constant arrays and especially how the loop splitting pass could have an
> influence on the potential constant related optimizations such as constant
> folding I am using  the following example:
>
>
>
> Considering an array where the prefix of that array is compile-time
> constant data, and the rest of the array is runtime data, should the
> compiler be able to optimize the calculation for the first part of the
> array?
>
> Let's look at the below example:
>
>
>
> You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> In our sum function we are measuring the some of the array elements, where
> the first half of it is  static compile-time constants and the second half
> are dynamic data.
>
> When we compile this with the "x86-64 GCC 12.1" compiler with "-O3
> -std=c++2a " flags, we get the following assembly code:
>
>
>
>  main:
>
>         mov     rax, QWORD PTR .LC0[rip]
>
>         mov     DWORD PTR [rsp-28], edi
>
>         mov     DWORD PTR [rsp-32], 3
>
>         movq    xmm1, QWORD PTR [rsp-32]
>
>         mov     QWORD PTR [rsp-40], rax
>
>         movq    xmm0, QWORD PTR [rsp-40]
>
>         lea     eax, [rdi+1]
>
>         add     edi, 2
>
>         mov     DWORD PTR [rsp-24], eax
>
>         paddd   xmm0, xmm1
>
>         mov     DWORD PTR [rsp-20], edi
>
>         movq    xmm1, QWORD PTR [rsp-24]
>
>         paddd   xmm0, xmm1
>
>         movd    eax, xmm0
>
>         pshufd  xmm2, xmm0, 0xe5
>
>         movd    edx, xmm2
>
>         add     eax, edx
>
>         ret
>
> .LC0:
>
>         .long   200
>
>         .long   2
>
>
>
>
>
> However, if we add an “if” condition in the loop for calculating the result
> of the sum, the if condition seems to enable the loop splitting pass:
>
>
>
> You can see the code and its assembly here:  https://godbolt.org/z/ejecbjMKG
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>     if (i < 1)
>
>         res += array[i];
>
>     else
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     int arr1[6] = {200,2,3, argc, argc+1, argc+2};
>
>     return  sum(arr1, 6);
>
> }
>
>
>
>
>
> we get the following assembly code:
>
>
>
> main:
>
>         lea     eax, [rdi+208+rdi*2]
>
>         ret
>
>
>
>
>
> As you can see the “if” condition has the same calculation for both the
> “if” and “else” branch in calculating the sum over the array, however, it
> seems that it is triggering the “loop splitting pass” which results in
> further optimizations such as constant folding of the whole computation and
> resulting in such a smaller and faster assembly code.
>
> My question is, why the compiler is not able to take advantage of
> constantans in the prefix of the array in the first place?
>
> Also adding a not necessary “if condition” which is just repeating the same
> code for "if" and "else", doesn’t seem to be the best way to hint the
> compiler to take advantage of this optimization; so is there another way to
> make the compiler aware of this? ( I used the -fsplit-loops flag and it
> didn't have any effect for this example.)
>
>
>
> As a next step if we use an array that has some constant values in the
> prefix but not a compile time constant length such as the following example:
>
> Code link is here: https://godbolt.org/z/3qGqshzn9
>
>
>
> #include <cstddef>
>
> inline int sum(const int array[], size_t len) {
>
>   int res = 0;
>
>   for (size_t i = 0; i < len; i++) {
>
>     if (i < 1)
>
>         res += array[i];
>
>     else
>
>         res += array[i];
>
>   }
>
>   return res;
>
> }
>
> int main(int argc, char** argv)
>
> {
>
>     size_t len = argc+3;
>
>     int arr3[len] = {600,10,1};
>
>     for (unsigned int i = 3; i < len; i++) arr3[i] = argc+i;
>
>     return sum(arr3, 2);
>
> }
>
>
>
> In this case the GCC compiler is not able to apply constant folding on the
> first part of the array!
>
> In general is there anyway that the GCC compiler would understand this and
> apply constant folding optimizations here?

GCC can currently only constant fold this when it decides to unroll the
loop portions operating in constant data.

Richard.

Reply via email to