On Fri, May 27, 2022 at 11:13 PM Laleh Beni via Gcc <gcc@gcc.gnu.org> wrote: > > GCC compiler is able to understand if the prefix of an array holds > constant/static data and apply compiler optimizations on that partial > constant part of the array, however, it seems that it is not leveraging > this information in all cases. > > On understanding the behavior of compiler optimization for partially > constant arrays and especially how the loop splitting pass could have an > influence on the potential constant related optimizations such as constant > folding I am using the following example: > > > > Considering an array where the prefix of that array is compile-time > constant data, and the rest of the array is runtime data, should the > compiler be able to optimize the calculation for the first part of the > array? > > Let's look at the below example: > > > > You can see the code and its assembly here: https://godbolt.org/z/xjxbz431b > > > > #include <cstddef> > > inline int sum(const int array[], size_t len) { > > int res = 0; > > for (size_t i = 0; i < len; i++) { > > res += array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > int arr1[6] = {200,2,3, argc, argc+1, argc+2}; > > return sum(arr1, 6); > > } > > > > > > In our sum function we are measuring the some of the array elements, where > the first half of it is static compile-time constants and the second half > are dynamic data. > > When we compile this with the "x86-64 GCC 12.1" compiler with "-O3 > -std=c++2a " flags, we get the following assembly code: > > > > main: > > mov rax, QWORD PTR .LC0[rip] > > mov DWORD PTR [rsp-28], edi > > mov DWORD PTR [rsp-32], 3 > > movq xmm1, QWORD PTR [rsp-32] > > mov QWORD PTR [rsp-40], rax > > movq xmm0, QWORD PTR [rsp-40] > > lea eax, [rdi+1] > > add edi, 2 > > mov DWORD PTR [rsp-24], eax > > paddd xmm0, xmm1 > > mov DWORD PTR [rsp-20], edi > > movq xmm1, QWORD PTR [rsp-24] > > paddd xmm0, xmm1 > > movd eax, xmm0 > > pshufd xmm2, xmm0, 0xe5 > > movd edx, xmm2 > > add eax, edx > > ret > > .LC0: > > .long 200 > > .long 2 > > > > > > However, if we add an “if” condition in the loop for calculating the result > of the sum, the if condition seems to enable the loop splitting pass: > > > > You can see the code and its assembly here: https://godbolt.org/z/ejecbjMKG > > > > #include <cstddef> > > inline int sum(const int array[], size_t len) { > > int res = 0; > > for (size_t i = 0; i < len; i++) { > > if (i < 1) > > res += array[i]; > > else > > res += array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > int arr1[6] = {200,2,3, argc, argc+1, argc+2}; > > return sum(arr1, 6); > > } > > > > > > we get the following assembly code: > > > > main: > > lea eax, [rdi+208+rdi*2] > > ret > > > > > > As you can see the “if” condition has the same calculation for both the > “if” and “else” branch in calculating the sum over the array, however, it > seems that it is triggering the “loop splitting pass” which results in > further optimizations such as constant folding of the whole computation and > resulting in such a smaller and faster assembly code. > > My question is, why the compiler is not able to take advantage of > constantans in the prefix of the array in the first place? > > Also adding a not necessary “if condition” which is just repeating the same > code for "if" and "else", doesn’t seem to be the best way to hint the > compiler to take advantage of this optimization; so is there another way to > make the compiler aware of this? ( I used the -fsplit-loops flag and it > didn't have any effect for this example.) > > > > As a next step if we use an array that has some constant values in the > prefix but not a compile time constant length such as the following example: > > Code link is here: https://godbolt.org/z/3qGqshzn9 > > > > #include <cstddef> > > inline int sum(const int array[], size_t len) { > > int res = 0; > > for (size_t i = 0; i < len; i++) { > > if (i < 1) > > res += array[i]; > > else > > res += array[i]; > > } > > return res; > > } > > int main(int argc, char** argv) > > { > > size_t len = argc+3; > > int arr3[len] = {600,10,1}; > > for (unsigned int i = 3; i < len; i++) arr3[i] = argc+i; > > return sum(arr3, 2); > > } > > > > In this case the GCC compiler is not able to apply constant folding on the > first part of the array! > > In general is there anyway that the GCC compiler would understand this and > apply constant folding optimizations here?
GCC can currently only constant fold this when it decides to unroll the loop portions operating in constant data. Richard.