On Fri, Apr 30, 2021 at 9:51 AM Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote:
>
> * Feng Xue:
>
> > To simplify explanation of this memory gathering optimization, pseudo
> > code on original nested loops is given as:
> >
> >   outer-loop ( ) { /* data in inner loop are also invariant in outer loop.  
> > */
> >       ...
> >       inner-loop (iter, iter_count) {  /* inner loop to apply MGO */
> >
> >           Type1 v1 = LOAD_FN1 (iter);
> >           Type2 v2 = LOAD_FN2 (v1);
> >           Type3 v3 = LOAD_FN3 (v2);
> >           ...
> >           iter = NEXT_FN (iter);
> >       }
> >
> >      ...
> >   }
> >
> > "LOAD_FNx()" is a conceptual function that translates its argument to a
> > result, and is composed of a sequence of operations in which there is
> > only one memory load. It is capable of representing simple memory
> > dereference like "iter->field", or more complicated expression like
> > "array[iter * 2 + cst].field".
> >
> > Likewise, "NEXT_FN()" is also a conceptual function which defines how an
> > iterator is advanced. It is able to describe simple integer iterator,
> > list iterator, or even pure-function-based iterator on advanced stuff
> > like hashset/tree.
> >
> > Then the code will be transformed to:
> >
> >   typedef struct cache_elem {
> >       bool   init;
> >       Type1  c_v1;
> >       Type2  c_v2;
> >       Type3  c_v3;
> >   } cache_elem;
> >
> >   cache_elem *cache_arr = calloc (iter_count, sizeof(cache_elem));
> >
> >   outer-loop () {
> >       ...
> >       size_t cache_idx = 0;
> >
> >       inner-loop (iter, iter_count) {
> >
> >           if (!cache_arr[cache_idx]->init) {
> >               v1 = LOAD_FN1 (iter);
> >               v2 = LOAD_FN2 (v1);
> >               v3 = LOAD_FN3 (v2);
> >
> >               cache_arr[cache_idx]->init = true;
> >               cache_arr[cache_idx]->c_v1 = v1;
> >               cache_arr[cache_idx]->c_v2 = v2;
> >               cache_arr[cache_idx]->c_v3 = v3;
> >           }
> >           else {
> >               v1 = cache_arr[cache_idx]->c_v1;
> >               v2 = cache_arr[cache_idx]->c_v2;
> >               v3 = cache_arr[cache_idx]->c_v3;
> >           }
> >           ...
> >           cache_idx++;
> >           iter = NEXT_FN (iter);
> >       }
> >       ...
> >   }
> >
> >   free (cache_arr);
>
> Why do you need the init field?  Isn't the array initialized
> sequentially?
>
> You need to keep the original loop and use it if calloc fails.  The
> transformation is not valid in general because calloc is not an
> async-signal-safe function, and GCC shouldn't introduce such calls if
> they are not present in the original source code.  For
> -fnon-call-exceptions, you need to introduce unwinding code that calls
> free.
>
> Can you change this optimization so that it can use a fixed-size buffer?
> This would avoid all issues around calling calloc.  If iter_count can be
> very large, allocating that much extra memory might not be beneficial
> anyway.

It would need to be TLS storage though or protected against concurrent
access (maybe an atomic is enough if we drop to the original loop when
concurrent access is detected).

Richard.

> Thanks,
> Florian
>

Reply via email to