On 14/11/2018 22:30, Jeff Law wrote:
There's a particular case that has historically been problematical.

If you have this kind of sequence in the epilogue

        restore register using FP
        move fp->sp  (deallocates frame)
        return

Under certain circumstances the scheduler can swap the register restore
and move from fp into sp creating something like this:

        move fp->sp (deallocates frame)
        restore register using FP (reads from deallocated frame)
        return

That would normally be OK, except if you take an interrupt between the
first two instructions.  If interrupt handling is done without switching
stacks, then the interrupt handler may write into the just de-allocated
frame destroying the values that were saved in the prologue.

OK, so the barrier needs to be right before the stack pointer moves. I can do that. :-)

Presumably the same is true for prologues, except that the barrier needs to be after the stack adjustment.

You may not need to worry about that today on the GCN port, but you
really want to fix it now so that it's never a problem.  You *really*
don't want to have to debug this kind of problem in the wild.  Been
there, done that, more than once :(

I'm not exactly sure how interrupts work on this platform -- we've had no use for them yet -- but without a debugger, and with up to 1024 threads running simultaneously, you can be sure I don't want to debug it!

I would hazard a guess that combine saw the one without the use as
"simpler" and preferred it.  I think you've made a bit of a fundamental
problem with the way the EXEC register is being handled.  Hopefully you
can get by with some magic UNSPEC wrappers without having to do too much
surgery.

Exactly so. An initial experiment with combine re-enabled has not shown any errors, so it's possible the problem has gone away, but I've not been over the full testsuite yet (and you wouldn't expect actual failures anyway).

In future, I'd like to have the scheduler insert real instructions into
these slots, but that's very much on the to-do list.
If you you can model this as a latency between the two points where you
need to insert the nops, then the scheduler will fill in what it can.
But it doesn't generally handle non-interlocked processors.   So you'll
still want your little pass to fix things up when the scheduler couldn't
find useful work to schedule into those bubbles.

Absolutely, the scheduler is about optimization and this md_reorg pass is about correctness.

I have no idea whether the architecture has those issues or not.
The guideline I would give to determine if you're vulnerable...  Do you
have speculation, including the ability to speculate past a memory
operation, branch prediction, memory caches and high resolution timer
(ie, like a cycle timer).  If you've got those, then the processor is
likely vulnerable to a spectre V1 style attack.  Those are the basic
building blocks.

We have cycle timers and caches, but I'll have to ask AMD about the other details.

Andrew

Reply via email to