On 11/12/18 5:53 AM, Andrew Stubbs wrote: > On 09/11/2018 19:39, Jeff Law wrote: >>> + >>> +/* Generate epilogue. Called from gen_epilogue during >>> pro_and_epilogue pass. >>> + >>> + See gcn_expand_prologue for stack details. */ >>> + >>> +void >>> +gcn_expand_epilogue (void) >> You probably need a barrier in here to ensure that the scheduler doesn't >> move an aliased memory reference into the local stack beyond the stack >> adjustment. >> >> You're less likely to run into it because you eliminate frame pointers >> fairly aggressively, but it's still the right thing to do. > > Sorry, I'm not sure I understand what the problem is? How can this > happen? Surely the scheduler wouldn't change the logic of the code? There's a particular case that has historically been problematical.
If you have this kind of sequence in the epilogue restore register using FP move fp->sp (deallocates frame) return Under certain circumstances the scheduler can swap the register restore and move from fp into sp creating something like this: move fp->sp (deallocates frame) restore register using FP (reads from deallocated frame) return That would normally be OK, except if you take an interrupt between the first two instructions. If interrupt handling is done without switching stacks, then the interrupt handler may write into the just de-allocated frame destroying the values that were saved in the prologue. You may not need to worry about that today on the GCN port, but you really want to fix it now so that it's never a problem. You *really* don't want to have to debug this kind of problem in the wild. Been there, done that, more than once :( >> This seems a bit hokey. Why specifically is combine removing the USE? > > I don't understand combine fully enough to explain it now, although at > the time I wrote this, and in a GCC 7 code base, I had followed the code > through and observed what it was doing. > > Basically, if you have two patterns that do the same operation, but one > has a "parallel" with an additional "use", then combine will tend to > prefer the one without the "use". That doesn't stop the code working, > but it makes a premature (accidental) decision about instruction > selection that we'd prefer to leave to the register allocator. > > I don't recall if it did this to lone instructions, but it would > certainly do so when combining two (or more) instructions, and IIRC > there are typically plenty of simple moves around that can be easily > combined. I would hazard a guess that combine saw the one without the use as "simpler" and preferred it. I think you've made a bit of a fundamental problem with the way the EXEC register is being handled. Hopefully you can get by with some magic UNSPEC wrappers without having to do too much surgery. > >>> + /* "Manually Inserted Wait States (NOPs)." >>> + >>> + GCN hardware detects most kinds of register dependencies, but >>> there >>> + are some exceptions documented in the ISA manual. This pass >>> + detects the missed cases, and inserts the documented number of >>> NOPs >>> + required for correct execution. */ >> How unpleasant :( But if it's what you need to do, so be it. I'll >> assume the compiler is the right place to do this -- though some ports >> handle this kind of stuff in the assembler or linker. > > We're using an LLVM assembler and linker, so we have tried to use them > as is, rather than making parallel changes that would prevent GCC > working with the last numbered release of LLVM (see the work around for > assembler bugs in the BImode mode instruction). > > Expecting the assembler to fix this up would also throw off the > compiler's offset calculations, and the near/far branch instructions > have different register requirements it's better for the compiler to > know about. > > The MIPS backend also inserts NOPs in a similar way. MIPS isn't that simple. If you're not in a .reorder block and you don't have interlocks, then it leaves it to the assembler... If you have near/far branch calculations, then those have to be aware of the nop insertions and you're generally better off doing them both in the same tool. You've chosen the compiler. It's a valid choice, but does have some downsides. The assembler is a valid choice too, with a different set of downsides. > > In future, I'd like to have the scheduler insert real instructions into > these slots, but that's very much on the to-do list. If you you can model this as a latency between the two points where you need to insert the nops, then the scheduler will fill in what it can. But it doesn't generally handle non-interlocked processors. So you'll still want your little pass to fix things up when the scheduler couldn't find useful work to schedule into those bubbles. > > Oh, OK. :-( > > I have no idea whether the architecture has those issues or not. The guideline I would give to determine if you're vulnerable... Do you have speculation, including the ability to speculate past a memory operation, branch prediction, memory caches and high resolution timer (ie, like a cycle timer). If you've got those, then the processor is likely vulnerable to a spectre V1 style attack. Those are the basic building blocks. Jeff