Hi Razya great to hear these Graphite plans. Some short comments.
On Tue, 2009-03-10 at 16:13 +0200, Razya Ladelsky wrote: > [...] > > The first step, as we see it, will teach Graphite that parallel code needs > to be produced. > This means that Graphite will recognize simple parallel loops (using SCoP > detection and data dependency analysis), > and pass on that information. > The information that needs to be conveyed expresses that a loop is > parallelizable, and may also include annotations of more > detailed information e.g, the shared/private variables. > > There are two possible models for the code generation: > 1. Graphite will annotate parallel loops and pass that information all the > way through CLOOG > to the current autopar code generator to produce the parallel, GOMP based > code. It might be possible to recognize parallel loops in graphite, but you should keep in mind that in the graphite polyhedral representation loops do not yet exist. So you would have to foresee which loops CLOOG will produce. This might be possible depending how strict the scheduling we give to CLOOG is. Another problem is, that cloog might split some loops automatically (if possible) to reduce the control flow. > 2. Graphite will annotate the parallel loops and CLOOG itself will be > responsible of generating > the parallel code. The same as above. It will hard to mark loops as loops do not yet exist. > A point to notice here is that scalars/reductions are > currently not > handled in Graphite. We are working heavily on this. Expect it to be ready at least at the end of march. Hopefully the end of this week. > In the first model, where Graphite calls autopar's code generation, > scalars can be handled. 3. Wait for cloog to generate the new loops. As we have the polyhedral information (poly_bb_p) still available during code generation, we can try to update the dependency information using the restrictions cloog added and use the polyhedral dependency analysis to check if there are any dependencies in the CLOOG generated loops. So we can add a pass in between CLOOG and clast-to-gimple that marks parallel loops. Advantage: - Can be 100% exact, no forecasts as we are working on actually generated loops. - Nice splitting of what is done where. 1. Graphite is in charge of optimizations (generate parallelism) 2. CodeGen just detects parallel loops and generates code for them. > After Graphite finishes its analysis, it calls autopar's reduction > analysis, and only then the code > generation is called (if the scalar analysis determines that the loop > still parallelizable, of course) > > Once the first step is accomplished, the following steps will focus on > teaching Graphite > to find loop transformations (such as skewing, interchange etc.) that > expose coarse grain synchronization free parallelism. > This will be heavily based on the polyhedral data dependence and > transformation infrastructures. > We have not determined which algorithm/ techniques we're going to use for > this part. > > Having synchronization free parallelization integrated in Graphite, will > set the ground for > handling parallelism requiring a small amount of parallelization. Yes, great. This will allow us to experiment with advanced auto parallelization. I am really looking forward to see the first patches! > This is a rough view for our planned work on autopar in GCC. > Please feel free to ask/comment. > > Thanks, > Razya