On 08/06/2021 13:37, Bruno Piazera Larsen wrote:
On 08/06/2021 12:35, Richard Henderson wrote:
On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote:
That's odd. We already have more arguments than the number of
argument registers... A 5x slowdown is distinctly odd.
I did some more digging and the problem is not with
ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which
currently has 7 arguments and we're increasing to 8. 7 feels like
the correct number, but I couldn't find docs supporting it, so I
could be wrong.
According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers
for ppc hosts. But now I see you didn't actually say on which host
you observed the problem... It's 6 argument registers for x86_64 host.
Oh, yes, sorry. I'm experiencing it in a POWER9 machine (ppc64le
architecture). According to tcg this shouldn't be the issue, then, so
idk if that's the real reason or not. All I know is that as soon as
gcc can't optimize an argument away it happens (fprintf in
radix64_xlate, using one of the mmuidx_* functions, defining those as
macros).
I'll test it in my x86_64 machine and see if such a slowdown happens.
It's not conclusive evidence, but the function is too complex for me
to follow the disassembly if I can avoid it...
Test has been done: Slow down also happens on the x86_64 machine (but
without change its already 360s, so idk if the slowdown is that
dramatic), so it's _probably_ not going over the argument register
count. I have no clue what could be. Still working on the struct version
to see if anything changes.
--
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>