Hello, I want to propose a project for Google Summer of Code on title "New static scheduling heuristic". I hope that Vlad Makarov from Redhat or Andrey Belevantsev from ISP RAS will menthor this application. I will appreciate any feedback and will try to answer any questions regarding my application.
======================================================================== Abstract Most scheduling approaches consist of two steps: computation of sets of available instructions at the current scheduling point, and choosing one of them to become the next instruction. This approach is used in GCC compiler in current implementation of list scheduler (known as Haifa scheduler) and new approach under developing – selective scheduling. While most activities aimed at improving scheduling quality are trying to increase number of gathered available instructions (such as above-mentioned selective scheduling, treegion scheduling, improvements of region formation), choosing step plays an important role too. Current implementation of scheduler uses static priorities, based on critical path heuristic, which is known to behave well while scheduling single basic blocks and become worse when expanding scheduling boundaries to extended basic blocks or regions. This project aims at developing a new priorities system, which would consider not only the critical path, but other possible exits from scheduling region and their probabilities. Current implementation First of all, depending on area, from which available instructions are gathered, there are two schedulers present in GCC, namely sched-ebb and sched-region. The scheduling infrastructure of these schedulers is essentially the same, driven by haifa-sched. Current implementation of scheduling priorities is divided onto two phases – static and dynamic. The most important is static part – it is considered first, and in case of equal static priorities, dynamic priority is used to determine which one of instructions is better. Given a dependence graph, static priority is calculated as default instruction latency for all instructions without successors in dependence graph, and maximum over all successors of sum of priority of successor and cost of dependency – for other instructions. This way, considering only one basic block, instructions that are on the critical path to basic block exit are the most prioritized ones. It is important to note that for sched-rgn dependencies leading from one basic block to other are just discarded. Drawbacks of current implementation Critical path heuristic begins to fail to find good priorities when used on areas, bigger than single basic block, i.e. with several exits. For extended basic block this heuristic tends to give most priority to instructions that contribute to the last (with the longest path from entry in dependence graph) exit. At the same time instructions, contributing to other exits, are delayed. Consequently, if earliest exit from extended basic block is taken most of the time, this can lead to performance regression. Behavior of critical path heuristic in case of region scheduling is trickier. From the definition of priority, instructions from the end of basic block are generally less prioritized than from the beginning. Hence, when scheduling the end part of some basic block in region, instructions from beginning of successive basic blocks may have bigger priority than instructions from the end of current basic block. This, in turn, can lead to performance regression too. Ways to overcome drawbacks The drawbacks of current priority scheme can be evaded in several ways. First, critical path priorities can be adjusted dynamically with regard to the current scheduling point. Second, compute new static priorities with regard to probabilities of taking corresponding exits. The background idea of this approach is that instruction priority is increased for some amount for each exit it contributes to. This amount depends on the probability of taking current exit and the depth of instruction in dependence graph, consisting only from instructions, contributing to current exit. In this way, instructions contributing to several exits will get higher priority. Likewise, instructions that are situated higher in dependence graph will get lower priority. Third, taking into account processor resources while creating data dependence graph can increase accuracy of priority calculation. I aim mostly at the implementation of second approach as it seems to solve known problems and then to improve it with resource-aware priority computation. Roadmap I plan to implement and test on ia64 platform all three approaches to solving static scheduling priority problem for sched-ebb and sched-rgn and do it later for selective scheduling.