On 07/06/2021 18:06, Richard Henderson wrote:
On 6/7/21 12:29 PM, Bruno Piazera Larsen wrote:
I just tried sending mmu_idx all the way down, but I ran into a very
weird bug of gcc. If we have to add one more parameter that GCC can't
just optimize away we get at least a slow down of 5x for the first
test of check-acceptance (could be more, but the test times out after
900 seconds, so I'm not sure).
That's odd. We already have more arguments than the number of
argument registers... A 5x slowdown is distinctly odd.
I did some more digging and the problem is not with
ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which
currently has 7 arguments and we're increasing to 8. 7 feels like the
correct number, but I couldn't find docs supporting it, so I could be
wrong.
One way that I managed to get around that is saving the current MSR,
setting it to 5, and restoring after the xlate call. The code ended
up something like:
int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
int old_idx = env->msr & clr;
clr = ~clr;
/* set new msr so we don't need to send the mmu_idx */
env->msr = (env->msr & clr) | new_idx;
ret = ppc_radix64_partition_scoped_xlate(...);
/* restore old mmu_idx */
env->msr = (env->msr & clr) | old_idx;
No, this is silly.
We need to do one of two things:
- make sure everything is inlined,
- reduce the number of arguments.
We're currently passing in 9 arguments, which really is too many
already. We should be using something akin to mmu_ctx_t, but probably
specific to radix64 without the random stuff collected for random
other mmu models.
That means we'd have to define radix_ctx_t (or however we call it) in
radix64.h, setup the struct on ppc_xlate, then pass it to ppc_radix64_xlate.
From looking at the code, it seems the most useful bits to put in the
struct are: eaddr, g_addr, h_addr, {h,g}_prot, {g,h}_page_size, mmu_idx
and guest_visible. They all seem reasonable to me, but I might be
missing something again.
r~
Another question: I know hash mmus don't have this problem, but since
ppc_jumbo_xlate also uses mmu_idx, should we make those xlate user
mmu_idxs as well? I tested and it doesn't make a time difference.
--
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>