machine learning for loop unrolling

Stefan Ciobaca Fri, 08 Jun 2007 06:30:30 -0700

Hello everyone,

For my bachelor thesis I'm modifying gcc to use machine learning to
predict the optimal unroll factor for different loops (inspired from
this paper: http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-938.pdf).


I've compiled gcc 4.1.2 on my machine and I've located the
loop-unroll.c file, where the unrolling is actually done (Richard
Guenther also suggested that I loop in tree-ssa-loop*.c, because some
unrolling is done at that level. From what I can tell, only complete
unrolling is done there, which is not interesting from a machine
learning point of view).

To learn the best unroll factor, I need several features for each
loop. I've already figured out how to compute some of them:

- number of operations (num_loop_insns)
- number of branches (num_loop_branches)
- loop nest level (can be easily computed by walking struct loops *)

but there are a lot of other features suggested by the authors of the
paper, for which I couldn't locate a convenient function:

The number of floating point ops. in loop body.
The number of memory ops. in loop body.
The number of operands in loop body.
The number of implicit instructions in loop body.
The number of unique predicates in loop body.
The estimated latency of the critical path of loop.
The estimated cycle length of loop body.
The language (C or Fortran).
The number of parallel "computations" in loop.
The max. dependence height of computations.
The max. height of memory dependencies of computations.
The max. height of control dependencies of computations.
The average dependence height of computations.
The number of indirect references in loop body.
The min. memory-to-memory loop-carried dependence.
The number of memory-to-memory dependencies.
The tripcount of the loop (-1 if unknown).
The number of uses in the loop.h
The number of defs. in the loop.

Of course, not all of these are relevant to gcc. I'm looking at ways
to compute some of these features, hopefully the most relevant ones.
If there is already some work done that I could use in order to
compute some of these features, I'd be glad if you could tell me about
it. Also, if anyone can think of some useful features, related to the
target architecture or the loops structure, I'd be glad to hear about
them.

Also, I'm interested in some benchmarks. Many of the research papers
that describe compiler optimizations use the SPEC* benchmarks, but
these are not free, so I'm looking for alternatives. Currently I'm
looking into:

- OpenBench
- Botan
- CSiBE
- Polyhedron

(thanks to richi of #gcc for the last 3)

Do you know any other one that would be better?

Here is how I'm thinking of conducting the experiment:

- for each innermost loop:
   - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
measure the time the benchmark takes
   - write down the loop features and the best unroll factor
- apply some machine learning technique to the above data to determine
the correlations between loop features and best unroll factor
- integrate the result into gcc and measure the benchmarks again

Do you think it is ok to only consider inner-most loops? What about
the unroll factors? Should I consider bigger unroll factors? Do you
think the above setup is ok?

I welcome any feedback on this.

Thank you,

Stefan Ciobaca

machine learning for loop unrolling

Reply via email to