On 12/01/2015 02:11 PM, Steve Ellcey  wrote:

With the current top-of-tree we now generate:

        addiu   $4,$4,1
$L8:
        lbu     $3,-1($4)
        addiu   $5,$5,1
        beq     $3,$0,$L7
        lbu     $2,-1($5)  # This is a branch delay slot
        beq     $3,$2,$L8
        addiu   $4,$4,1    # This is a branch delay slot

        subu    $2,$3,$2   # Done only once now after exiting loop.

The main problem with the new loop is that the beq comparing $2 and $3
is right before the load of $2 so there can be a delay due to the time
that the load takes.  The ideal code would probably be:
I'd start by looking at the code prior to reorg/delay slot scheduling. It may be the case that you're running into the well known issue that when reorg knows nothing about latency/scheduling issues and happily picks whatever insn can safely fill the delay slot. In doing so, reorg may muck up the schedule badly.

If that's the case you might test disallowing operations with > 1 cycle latency in delay slots and see how that effects a wider range of benchmarks.

Jeff

Reply via email to