On 16 Aug 2023, at 00:22, Demi Marie Obenour <demioben...@gmail.com> wrote:

I’d claim they don’t help at all.  They might if “random” was truly random, 
which in hardware is rarely the case. “Random” cache replacement is more 
properly described as “unspecified”. In reality it’s usually using some 
existing state (eg some counter) and as such is actually fully deterministic, 
you only have to reverse-engineer the mechanism.

Besides that, fully-associative caches (unless very small) are inherently slow 
(long critical path) and energy-hungry many gates to traverse).

What about Chameleon Cache (https://arxiv.org/pdf/2209.14673.pdf)?

I had a look, and the paper pretty much confirms what I’ve said. (Besides being 
only a proposal…)

For one, they don’t actually do an FA cache, nor true randomness, they 
approximate both in some fashion so that the effect is basically the same (so 
they claim, I didn’t read it carefully enough to confirm that, note that the 
paper is published in a third-tier venue I hadn’t even heard of before – IEEE 
International Symposium on Secure and Private Execution Environment Design).

It does all this by making the hardware more complicated (and thus more likely 
to be buggy/attackable). Specifically, they take the IMHO wrong approach of 
trying to solve the whole problem in hardware, discarding alternative 
approaches as “involve[ing] software to manage the partitions”. As I keep 
saying, security is inherently a HW-SW-codesign problem, you won’t get good 
solutions by trying to do it all in HW or in SW. The right approach is 
policy-mechanism separation: HW provides minimal (and ideally orthogonal) 
mechanisms, and SW uses them according to its security policy.

The HW-only approach means that the HW must take a pessimistic approach and 
prevent information flow across anything that looks like it might be a security 
boundary, whether or not it actually is (which only the OS can know). This has 
a (generally measurable) cost.

Here this cost is that the cache will likely be slower (in terms of access), 
likely more energy-hungry and have reduced hit rates compared to, say, a 4-way 
LRU cache. This will be a constant overhead, whether it’s needed or not. In 
contrast, flushing the L1 is a cost that is only needed when an actual 
security-domain switch happens. It is small (fraction of a µs), negligible 
unless domain-switch rates are unusually high.

There is a trade-off between such per-domain-switch costs vs constant overheads 
resulting from a complex design, but the paper doesn’t attempt to evaluate 
this. In fact, their evaluation commits the benchmarking crime of improper 
baseline (D1 of van der Kouwe et al [2020], see 
https://trustworthy.systems/publications/full_text/vanderKouwe_HABG_20.pdf or 
also my list https://gernot-heiser.org/benchmarking-crimes.html#base). They 
only compare against other comples, HW-only schemes.

Gernot
_______________________________________________
Devel mailing list -- devel@sel4.systems
To unsubscribe send an email to devel-leave@sel4.systems

Reply via email to