Re: machine learning for loop unrolling

Grigori Fursin Sun, 10 Jun 2007 14:54:11 -0700

Hi,

In addition to Kenneth's reply, here are a few references you may want to look 
at:


Edwin Bonilla, "Predicting Good Compiler Transformations Using Machine 
Learning", 
MS Thesis, School of Informatics, University of Edinburgh, UK, October 2004. 
http://www.inf.ed.ac.uk/publications/thesis/online/IM040129.pdf
It's about using machine learning to predict loop unrolling.

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M.F.P. O'Boyle, J. 
Thomson, 
M. Toussaint and C.K.I. Williams. Using Machine Learning to Focus Iterative 
Optimization. 
Proceedings of the 4th  Annual International Symposium on Code Generation and 
Optimization (CGO), 
New York, NY, USA, March 2006
http://fursin.net/papers/abcp2006.pdf

You may also want to look at our project on GCC Interactive Compilation 
Interface (GCC-ICI)
to access internal GCC transformations to enable external optimizations 
particularly
using machine learning (we are now working on a new version which should be 
available
in mid/end of summer):

http://gcc-ici.sourceforge.net
http://www.hipeac.net/system/files?file=7_Fursin.pdf

Hope it will be of any help,
Grigori Fursin

=====================================
Grigori Fursin, PhD
Research Fellow, INRIA Futurs, France
http://fursin.net/research




Re: machine learning for loop unrolling
From: Kenneth Hoste <kenneth dot hoste at elis dot ugent dot be> 
To: stefan dot ciobaca+gcc at gmail dot com 
Cc: GCC <gcc at gcc dot gnu dot org> 
Date: Fri, 8 Jun 2007 21:04:05 +0200 
Subject: Re: machine learning for loop unrolling 
References: <[EMAIL PROTECTED]> 

--------------------------------------------------------------------------------

On 08 Jun 2007, at 16:31, Stefan Ciobaca wrote:


Hello everyone,

For my bachelor thesis I'm modifying gcc to use machine learning to
predict the optimal unroll factor for different loops (inspired from
this paper: http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS- TR-938.pdf). 

Interesting. I'm using evolutionary algorithms for similar purposes in my 
current research...


<snip>


Of course, not all of these are relevant to gcc. I'm looking at ways
to compute some of these features, hopefully the most relevant ones.
If there is already some work done that I could use in order to
compute some of these features, I'd be glad if you could tell me about
it. Also, if anyone can think of some useful features, related to the
target architecture or the loops structure, I'd be glad to hear about
them.


I'm afraid I can't help here, I'm not familiar at all with GCCs internals.


Also, I'm interested in some benchmarks. Many of the research papers
that describe compiler optimizations use the SPEC* benchmarks, but
these are not free, so I'm looking for alternatives. Currently I'm
looking into:


- OpenBench
- Botan
- CSiBE
- Polyhedron

(thanks to richi of #gcc for the last 3)

Do you know any other one that would be better? 

But I can help here. Polyhedron is Fortran-only, but are well-suited for timing 
experiments (i.e.
they run long enough to have reasonable running times, but aren't too long 
either).
CSiBE is more targetted to code size, I believe the runtimes are ridicously 
small. I'm not familiar
with the other two.
Some other possibilities:

* MiDataSets (also fairly small when run only once, but the suite allows you to 
adjust the outer
loop iteration count to increase runtimes) 
[http://sourceforge.net/projects/midatasets]
* MediaBench / MediaBench II: multimedia workloads, which typically iterate 
over frames for example
[http://euler.slu.edu/~fritts/ mediabench/]
* BioMetricsWorkload [http://www.ideal.ece.ufl.edu/main.php?action=bmw]
* BioPerf: gene sequence analysis, ... [http://www.bioperf.org/]
* some other benchmarks commonly used when testing GCC [http:// 
www.suse.de/~gcctest]

I've been using the above with GCC and most work pretty well (on x86).



Here is how I'm thinking of conducting the experiment:


- for each innermost loop:
  - compile with the loop unrolled 1x, 2x, 4x, 8x, 16x, 32x and
measure the time the benchmark takes
  - write down the loop features and the best unroll factor
- apply some machine learning technique to the above data to determine
the correlations between loop features and best unroll factor


Any idea which? There's a huge number of different techniques out there, 
choosing an appropiate one
is critical to success.


- integrate the result into gcc and measure the benchmarks again 

When using machine learning techniques to build some kind of model, a common 
technique is
crossvalidation.
Say you have 20 benchmarks, no matter which ones. You use the larger part of 
those (for example 15)
to build the model (i.e. determine the correlations between loop features and 
best unroll factor),
and then test performance of that on the other ones. The important thing is not 
to use the
benchmarks you test with when using the machine learning technique. That way, 
you can (hopefully)
show that the stuff you've learned should work for other programs too.


Do you think it is ok to only consider inner-most loops? What about
the unroll factors? Should I consider bigger unroll factors? Do you
think the above setup is ok?


I'd say: don't be afraid to try too much. Try insane unroll factors too, just 
for testing purposes.
You don't want to limit yourself to 32x when the optimal could be 64x for 
example.



I welcome any feedback on this. 

Who is your advisor on this? Where are you doing your bachelors thesis?

greetings,

Kenneth

--

Computer Science is no more about computers than astronomy is about telescopes. 
(E. W. Dijkstra)


Kenneth Hoste
ELIS - Ghent University
email: [EMAIL PROTECTED]
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste

Re: machine learning for loop unrolling

Reply via email to