Cross-posting Reza's call for feedback to the binutils list since it is relevant - see the last few paragraphs regarding how to "solve the alignment problem".
Original thread: http://gcc.gnu.org/ml/gcc/2010-06/threads.html#00402 Not sure if followups should occur on one list or both. -- Quentin Neill On Thu, Jun 10, 2010 at 12:20 PM, reza yazdani <yazdani_r...@yahoo.com> wrote: > Hi, > > We are in the process of adding a feature to GCC to take advantage > of a new hardware feature in the latest AMD micro processor. This > feature requires a certain mix, ordering and alignments in > instruction sequences to obtain the expected hardware performance. > > I am asking the community to review this high level implementation > design and give me direction or advice. > > The new hardware issues two windows of the size N bytes of > instructions in every cycle. It goes into accelerate mode if the > windows have the right combination of instructions or alignments. Our > goal is to maximize the IPC by proper instruction scheduling and > alignments. > > Here is a summary of the most important requirements: > > a) Maximum of N instructions per window. > b) An instruction may cross the first window. > c) Each window can have maximum of x memory loads and y memory > stores . > d) The total number of immediate constants in the instructions > of a window should not exceed k. > e) The first window must be aligned on 16 byte boundary. > f) A Window set terminates when a branch exists in a window. > g) The number of allowed prefixes varies for instructions. > h) A window set needs to be padded by prefixes in instructions > or terminated by nops to ensure adherence to the rules. > > We have the following implementation plan for GCC: > > 1) Modify the Haifa scheduler to make the desired arrangement of > instructions for the two dispatch windows. The scheduler is called > once before and once after register allocation as usual. In both > cases it performs dispatch scheduling along with its normal job of > instruction scheduling. > > The advantage of doing it before register allocation is avoiding > extra dependencies caused by register allocation which may become > an obstacle to movement of instructions. The advantage of doing > it after register allocation is a consideration for spilling code > which may be generated by the register allocator. > > The algorithm we use is: > > a) Considering the current dispatch window set, choose the first > instruction from ready queue that does not violate dispatch rules. > b) When an instruction is selected and scheduled, inform the > dispatcher code about the instruction. This step keeps track of the > instruction content of windows for future evaluation. It also manages > the window set by closing and opening new virtual dispatch windows. > > 2) Insertion of alignment code. > > In x86 alignment is done by inserting prefixes or by generating > nops. As the object code is generated by the assembler in GCC, some > information such as sizes of branches are unknown until assembly or > link time. To do alignments related to dispatch correctly in GCC, > we need to iteratively compute prefixes and branch sizes until > its convergence. This pass currently does not exist in GCC, but it > exists in the assembler. > > There are two possible approaches to solve alignment problem. > > a) Let the assembler performs the alignments and padding needed > to adhere with the new machine dispatching rules and avoid an extra > pass in GCC. > b) Add a new pass to mimic what assembler does before generating > the assembly listing in GCC and insert the required alignments. > > I appreciate your comments on the proposed implementation procedure > and the choices a or b above. > > Reza Yazdani