Hi all, please cc me on replies. Hardware discussion may be found here: https://groups.google.com/forum/?nomobile=true#!topic/comp.arch/mzXXTU2GUSo
I am designing a new processor, based on RISCV, that is intended as a hybrid GPU VPU and CPU. For various reasons, it needs to be a multi-issue Out of Order engine. The innocent question was therefore asked, "how is Spectre to be dealt with?" which threw a massive spanner in the works. The processor is being designed to use multi-issue as a means to implement Vector Processing. For example: for predicated elements, several instructions (one per element) will be thrown into the *standard* multi-issue instruction queue, and cancelled only when the register containing the predicate mask is available and has been decoded. Thus, resources are taken up that will affect and be affected by other instructions, which is the very definition of Spectre timing attacks. ooops. Standard Spectre mitigation would completely destroy the performance and viability of the project's Vector Engine, as well as many other features. So I have a proposal that, if correct and implemented, may be adopted by other architectures as a mitigation solution that allows out of order to continue to be used. It is a collaborative solution that specifically requires explicit instructions to be added (and called) at the aporopriate time(s). The issue with Spectre attacks is that untrusted code may cause past OR FUTURE instructions to change the amount of time in which they will complete. An in-order architecture does not have this problem (except where pipeline stalls occur), as there is always [almost always] enough resources available that allow instructions (pipelines) to proceed without blocking. OoO typically has resource bottlenecks that are affected by other instructions. The whole POINT of an OoO design is to run ahead, utilising these resources speculatively and, duh, out of order. To deal with absolutely every possible flaw in the OoO paradigm is a total nightmare. Performance as people are discovering is utterly trashed. Code complexity both in software terms and hardware terms goes mental. Intel had to REMOVE hyperthreading from its latest processors, the crossover timing leakage is that bad. There is another way to ensure that untrusted code cannot affect secure code: clear out the "internal state" of the processor before letting it proceed to run the untrusted code. In this way it becomes impossible for untrusted code to ascertain the state of the processor, because it has been reset back to a known uniform (blank) state. This REQUIRES an actual instruction that programs (and the kernel) may call. It is NOT ENOUGH that the linux kernel try to deal with absolutely every possible situation automatically, and it is a total nightmare to even try. It is also not enough that the hardware try to deal with this on its own: that is insanely complex as well. The only real safe way is to abandon all of the benefits of OoO and go back to in-order SINGLE issue performance levels. Clearly, both options are not viable or acceptable. A hybrid solution is a reasonable compromise, that may even be possible to implement right now, with code that, on processors that do not have the proposed new instruction, issues sufficient NOPs (or other suitably researched instructions) such that they create a "processor internal state" firebreak between secure and untrusted code. The hardware version of the firebreak opcode would WAIT until the processor internal state has cleared out. All outstanding speculative instructions would be cancelled. All instructions waiting for pipelines to complete would be waited for until they had completed, and their results written to the register file. Only then would the processor be allowed to proceed. It is not enough to have these "firebreak" calls done automatically by the linux kernel: they need to be part of standard applications. An example is firefox, which has a single process for javascript. Specre atracks have been shown to exist using untrusted arbitrary javascript, and if that javascript is being executed by a single process, then it is the responsibility of that process to call the "firebreak" just before allowing the untrusted javascript to execute. This is going to be a mammoth task. The alternatives are to continue as things are, which is a mess that cannot be cleaned up by either of (mutually exclusive) hardware or software alone. Thoughts and feedback appreciated. l.