On 26/11/2012, at 1:28 PM, Greg McGary wrote: > I'm working onaport to a VLIW DSP with anexposed pipeline (i.e., no > interlocks). Some operations OPhave as much as 2-cycle latency on values > of the call-preserved regs CPR. E.g., if the callee's epiloguerestores a > CPR in the delay slot of the return instruction, then any OP with that CPR > as input needs to schedule 2 clocks after the call in order to get the > expected value. If OP schedules immediately after the call, then it will > getthevalue the callee's value prior to the epilogue restore. > > The easy, low-performance way to solve the problem is to schedule > epilogues to restore CPRs before the return and its delay slot. The > harder, usually better performing way is to manage dependences in the > caller so that uses of CPRs for OPs that require extra cycles schedule > at sufficient distance from the call. > > How shall I introduce these dependences for only the scheduler? As an > experiment, I added CLOBBERs to the call insn, which createdtrue > depencences between the call and downstream instructions that read the > CPRs, but had the undesired effect of perturbing dataflowacross calls. > I'm thinking sched-depsneedsnew code for targets with > TARGET_SCHED_EXPOSED_PIPELINE to add dependencesfor call-insn producers > and CPR-user consumers.
You essentially need a fix-up pass just before the end of compilation (machine-dependent reorg, if memory serves me right) to space instructions consuming values from CPRs from the CALL_INSNS that set those CPRs. I.e., for the 99% of compilation you don't care about this restriction, it's only the very last VLIW bundling and delay slot passes that need to know about it. You, probably, want to make the 2nd scheduler pass run as machine-dependent reorg (as ia64 does) and enable an additional constraint (through scheduling bypass) for the scheduler DFA to space CALL_INSNs from their consumers for at least for 2 cycles. One challenge here is that scheduler operates on basic blocks, and it is difficult to track dependencies across basic block boundaries. To workaround basic-block scope of the scheduler you could emit dummy instructions at the beginning of basic blocks that have predecessors that end with CALL_INSNs. These dummy instructions would set the appropriate registers (probably just assign the register to itself), and you will have a bypass (see define_bypass) between these dummy instructions and consumers to guarantee the 2-cycle delay. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics