On Wed, 3 May 2023, Jiaxun Yang wrote:

> Since it’s possible to run R2- binary on R2+ processor, we’d better find a
> semantic that do eliminate speculation on all processors. While SSNOPs
> on R2+ processors is pretty much undefined, there is no guarantee that
> SSNOP sequence can eliminate speculation.

 Not exactly undefined on R2+, SSNOP is still required to single-issue, so 
it does act as an execution barrier.  Good point otherwise.

 Both EHB and J[AL]R.HB are backwards compatible however (except for an 
obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
longer relevant), so I think the legacy sequence ought to just return via 
JR.HB as well, therefore providing the required semantics with newer 
hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
ignore it for bare metal, deferring to whoever might be interested for a 
workaround).

> My proposal is for R2- CPUs we can do a dummy syscall to act as instruction
> hazard barrier, since exception must clear the pipeline this should be true
> for all known implementations.

 I think the SSNOP approach should be sufficient.

> The most lightweight syscall I know is to do a MIPS_ATOMIC_SET with
> sysmips. A dummy variable on stack should do the track. Do let me know if 
> there
> is a better option.

 That would have to be gettimeofday(2) then, the most performance-critical 
one, and also one that does not have side effects.  The real syscall and 
not VSDO emulation of course (there's a reason it's emulated via VSDO, 
which is exactly our reason too).

> I have a vague memory about a discussion finding that exception does not 
> indicate
> a memory barrier, so perhaps we still need a sync preceding to that syscall.

 There is no claim that I could find in the architecture specification 
saying that taking an exception implies a memory barrier and therefore we 
must conclude it does not.  Likewise executing ERET.

 As I say I think the SSNOP approach should be sufficient, along with 
relying on SYNC emulation.

> > I think there may be no authoritative source of information here, this is 
> > a grey area.  The longest SSNOP sequences I have seen were for the various 
> > Broadcom implementations and counted 7 instructions.  Both the Linux 
> > kernel and the CFE firmware has them.
> 
> Was it for SiByte or BMIPS?

 Both AFAICT.

> > Also we may not be able to fully enforce ordering for the oldest devices 
> > that do not implement SYNC, as this is system-specific, e.g. involving 
> > branching on the CP0 condition with the BC0F instruction, and inventing an 
> > OS interface for that seems unreasonable at this point of history.
> 
> I guess this is not a valid concern for user space applications?
> As per R4000 manual BC0F will issue “Coprocessor unusable exception”
> exception and it’s certain that we have Staus.CU0 = 0 in user space.

 Exactly, which is why an OS service would have to provide the required 
semantics to the userland, and none might be available.  And we probably 
do not care anyway, because I gather this is a security feature to prevent 
certain types of data leaks via a side channel.  I wouldn't expect anyone 
doing any serious security-sensitive processing with legacy MIPS hardware.  
And then there's no speculative execution with all these pieces of legacy 
hardware (R3000, eh?) that have no suitable barriers provided, mostly 
because they are not relevant for them anyway.

 For bare metal we probably do not care about such legacy hardware either 
way.

 Overall I'd say let's do the best we can without bending backwards and 
then rely on people's common sense.

  Maciej

Reply via email to