On Fri, Apr 30, 2021 at 9:51 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote: > > * Feng Xue: > > > To simplify explanation of this memory gathering optimization, pseudo > > code on original nested loops is given as: > > > > outer-loop ( ) { /* data in inner loop are also invariant in outer loop. > > */ > > ... > > inner-loop (iter, iter_count) { /* inner loop to apply MGO */ > > > > Type1 v1 = LOAD_FN1 (iter); > > Type2 v2 = LOAD_FN2 (v1); > > Type3 v3 = LOAD_FN3 (v2); > > ... > > iter = NEXT_FN (iter); > > } > > > > ... > > } > > > > "LOAD_FNx()" is a conceptual function that translates its argument to a > > result, and is composed of a sequence of operations in which there is > > only one memory load. It is capable of representing simple memory > > dereference like "iter->field", or more complicated expression like > > "array[iter * 2 + cst].field". > > > > Likewise, "NEXT_FN()" is also a conceptual function which defines how an > > iterator is advanced. It is able to describe simple integer iterator, > > list iterator, or even pure-function-based iterator on advanced stuff > > like hashset/tree. > > > > Then the code will be transformed to: > > > > typedef struct cache_elem { > > bool init; > > Type1 c_v1; > > Type2 c_v2; > > Type3 c_v3; > > } cache_elem; > > > > cache_elem *cache_arr = calloc (iter_count, sizeof(cache_elem)); > > > > outer-loop () { > > ... > > size_t cache_idx = 0; > > > > inner-loop (iter, iter_count) { > > > > if (!cache_arr[cache_idx]->init) { > > v1 = LOAD_FN1 (iter); > > v2 = LOAD_FN2 (v1); > > v3 = LOAD_FN3 (v2); > > > > cache_arr[cache_idx]->init = true; > > cache_arr[cache_idx]->c_v1 = v1; > > cache_arr[cache_idx]->c_v2 = v2; > > cache_arr[cache_idx]->c_v3 = v3; > > } > > else { > > v1 = cache_arr[cache_idx]->c_v1; > > v2 = cache_arr[cache_idx]->c_v2; > > v3 = cache_arr[cache_idx]->c_v3; > > } > > ... > > cache_idx++; > > iter = NEXT_FN (iter); > > } > > ... > > } > > > > free (cache_arr); > > Why do you need the init field? Isn't the array initialized > sequentially? > > You need to keep the original loop and use it if calloc fails. The > transformation is not valid in general because calloc is not an > async-signal-safe function, and GCC shouldn't introduce such calls if > they are not present in the original source code. For > -fnon-call-exceptions, you need to introduce unwinding code that calls > free. > > Can you change this optimization so that it can use a fixed-size buffer? > This would avoid all issues around calling calloc. If iter_count can be > very large, allocating that much extra memory might not be beneficial > anyway.
It would need to be TLS storage though or protected against concurrent access (maybe an atomic is enough if we drop to the original loop when concurrent access is detected). Richard. > Thanks, > Florian >