Hi Mark, First of all I would like this discussion to be on the GCC mailing list; so I am CCing the GCC mailing list (I hope this is OK with all the others). "Davis, Mark" <[EMAIL PROTECTED]> wrote on 31/03/2005 00:23:02: > Mostafa & Gerald, > ... > It was mentioned that you folks had recently > added SMS to gcc4.0, and I found the SMS paper from last year's gcc > summit, and the description of SMS capabilities in gcc on the 4.0 > Features web site. So the obvious approach is to use SMS for Itanium as > well as Power5 and .... > > 1) Is SMS in gcc currently turned on for anything other than Power5? I > built a gcc4.0 for Itanium, and tried compiling the summation example > from your paper (and some unrolled summation examples) using > -O3 -fmodulo-sched We haven't yet put efforts to tune SMS for any specific architecture (including Power5); SMS is implemented as general as the paper (mentioned in http://gcc.gnu.org/news/sms.html) describes. > > but didn't see any difference in the .s file from not using > -fmodulo-sched. Are there other switches to turn on or dumps to look > at? I would suggest to start looking at SMS dumps to see what is doing there; you can do so by adding the -dm flag to your compilation. If you want you can send me those dumps and I will look into them. > > > I'm afraid I also was the origin of some of the "not very useful" > comments about SMS. From my way of thinking, if SMS doesn't have alias > information or array dependence analysis, then SMS can't pipeline loops > storing into array elements; therefore it is not very useful as a > pipeliner, even if the swing modulo scheduling part is excellent. > 2) Did I miss something here? This is true; that's why we need accurate alias info in RTL level and this is one of the efforts that one should concentrate on in improving SMS. > > I do not know about gcc internals (which is why I'm "project-managing", > not "implementing"), so it was interesting and disturbing to hear what > you and Vlad had to say about the different internal representations > relative to when the SMS phase runs: > a) it seems to be too early to see the machine code > b) it's too late to have alias info > > 3) Do you agree with this assessment? Its not black or white. We need accurate alias info at the RTL level to be able to software pipeline (SMS) the majority of the loops in the real world programs- currently the RTL alias info is not accurate enough for those loops. Having the alias info make us capable of eliminating memory to memory dependancies and thus make us know that we can interleave different iterations of the loop. The alias info is usually based on high level representation of the code, the lower you are the more information you lose. One of the things that we would do is maintain this information while we go down in the trees and RTL representation each pass that does some transformation on the code will require additional effort to maintain alias information which complicates it - that's why we want SMS to be as early as possible. The other side of the coin is the modeling of the machine resources (SMS is trying to solve a scheduling problem). In SMS we use DFA for resource modeling in which we follow each one of the instruction resource usage and try to get the optimal schedule by moving instructions among the different iteration trying to avoid resource conflicts. The problem in doing this early is that later passes can change the resource usage of instructions when doing transformations on the code (splitting instructions for example) and thus make the schedule not optimal. A good example for a way to handle this is the disabling of the second scheduling pass for SMSed loops to prevent it from screwing the schedules generated by SMS. We can do the same for other passes and have a cost model to decide if it is beneficial or not to perform the optimization inside the SMSed loops. Other problem that results from doing SMS before register allocation is increasing the register pressure when SMS is aggressive. IMO, this problem should be addressed later by using register pressure estimation inside SMS. > 4) Do you have any suggestions about using SMS for an in-order > microarchitecture like Itanium which is more sensitive to the exact > schedule than OOO microarchitectures like Power5? Actually I would say that an in-order machine would benefit more from SMS than an OOO machine, because theoretically OOO machines do the job of SMS in hardware in many cases. The problem is not for IA64 being in-order, but the fact that IA64 and other in-order machines are highly dependent on the scheduling and among them SMS. My suggestion is that we must invest in lowering alias info to RTL and feed this information to the DDG used by SMS which is implemented in ddg.c. > 5) In the Intel compiler for Itanium, we carry the alias information > from the high-level IL down to the machine-code level IL, and pipeline > on the machine-code IL, before register allocation. This is where SMS is currently positioned; it means that our problem is not where SMS is performed but the alias information not getting there. This is exactly what we were thinking all the time; the IC example reinforces this thought. > > thanks, > Mark Davis > Intel Compiler Lab > (formerly with DEC compiler team) > Nashua, NH > Mostafa.