On Thu, Jan 21, 2021 at 7:28 AM Feng Xue OS via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This patch implements a new loop optimization according to the proposal
> in RFC given at https://gcc.gnu.org/pipermail/gcc/2021-January/234682.html.
> So do not repeat the idea in this mail. Hope your comments on it.
>
> Bootstrapped/regtested on x86_64-linux and aarch64-linux.
>
> Thanks,
> Feng
>
> ----
> 2020-12-25  Feng Xue  <f...@os.amperecomputing.com>
>
> gcc/
>         PR tree-optimization/98598
>         * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o.
>         * common.opt (-ftree-loop-mgo): New option.

Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a good
name.  -floop-mgo would be better but still I'd have no idea what this would do.

I don't have a good suggestion here other than to expand it to
-floop-gather-memory (?!).

The option documentation isn't informative either.

From:

  outer-loop ()
    {
      inner-loop (iter, iter_count)
        {
          Type1 v1 = LOAD (iter);
          Type2 v2 = LOAD (v1);
          Type3 v3 = LOAD (v2);
          ...
          iter = NEXT (iter);
        }
    }

To:

  typedef struct cache_elem
    {
      bool   init;
      Type1  c_v1;
      Type2  c_v2;
      Type3  c_v3;
    } cache_elem;

  cache_elem *cache_arr = calloc (iter_count, sizeof (cache_elem));

  outer-loop ()
    {
      size_t cache_idx = 0;

      inner-loop (iter, iter_count)
        {
          if (!cache_arr[cache_idx]->init)
            {
              v1 = LOAD (iter);
              v2 = LOAD (v1);
              v3 = LOAD (v2);

              cache_arr[cache_idx]->init = true;
              cache_arr[cache_idx]->c_v1 = v1;
              cache_arr[cache_idx]->c_v2 = v2;
              cache_arr[cache_idx]->c_v3 = v3;
            }
          else
            {
              v1 = cache_arr[cache_idx]->c_v1;
              v2 = cache_arr[cache_idx]->c_v2;
              v3 = cache_arr[cache_idx]->c_v3;
            }
          ...
          cache_idx++;
          iter = NEXT (iter);
        }
    }

  free (cache_arr);

This is a _very_ special transform.  What it seems to do is
optimize the dependent loads for outer loop iteration n > 1
by caching the result(s).  If that's possible then you should
be able to distribute the outer loop to one doing the caching
and one using the cache.  Then this transform would be more
like a tradidional array expansion of scalars?  In some cases
also loop interchange could remove the need for the caching.

Doing MGO as the very first loop pass thus looks bad, I think
MGO should be much later, for example after interchange.
I also think that MGO should work in concert with loop
distribution (which could have an improved cost model)
rather than being a separate pass.

Your analysis phase looks quite expensive, building sth
like a on-the side representation very closely matching SSA.
It seems to work from PHI defs to uses, which looks backwards.

You seem to roll your own dependence analysis code :/  Please
have a look at loop distribution.

Also you build an actual structure type for reasons that escape
me rather than simply accessing the allocated storage at
appropriate offsets.

I think simply calling 'calloc' isn't OK because you might need
aligned storage and because calloc might not be available.
Please at least use 'malloc' and make sure MALLOC_ABI_ALIGNMENT
is large enough for the data you want to place (or perform
dynamic re-alignment yourself).  We probably want some generic
middle-end utility to obtain aligned allocated storage at some
point.

As said above I think you want to re-do this transform as
a loop distribution transform.  I think if caching works then
the loads should be distributable and the loop distribution
transform should be enhanced to expand the scalars to arrays.

Richard.



>         * opts.c (default_options_table): Enable -ftree-loop-mgo at -O3+.
>         * params.opt (mgo-max-dep-load-level): New parameter.
>         (mgo-max-cache-elem-size): Likewise.
>         (mgo-min-cache-array-length): Likewise.
>         (mgo-max-cache-array-length): Likewise.
>         * doc/invoke.texi (mgo-max-dep-load-level): Document new parameter.
>         (mgo-max-cache-elem-size): Likewise.
>         (mgo-min-cache-array-length): Likewise.
>         (mgo-max-cache-array-length): Likewise.
>         * passes.def: Add new pass_loop_mgo pass.
>         * timevar.def (TV_LOOP_MGO): New timevar.
>         * tree-pass.h (make_pass_loop_mgo): New declaration.
>         * tree-ssa-loop-mgo.c: New file.
>
> gcc/testsuite/
>         PR tree-optimization/98598
>         * gcc.dg/tree-ssa/mgo/mgo.exp: New file.
>         * gcc.dg/tree-ssa/mgo/mgo-common.h: New test header.
>         * gcc.dg/tree-ssa/mgo/list/list-1.c: New test.
>         * gcc.dg/tree-ssa/mgo/array/simple-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/simple-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/simple-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/param-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-5.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-6.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-7.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-8.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-9.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/dep-load-10.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-5.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/indx-iter-6.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-1.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-2.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-3.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-4.c: Likewise.
>         * gcc.dg/tree-ssa/mgo/array/outer-loop-5.c: Likewise.
> ---

Reply via email to