On Mon, 22 May 2023, Alex Bennée wrote:
BALATON Zoltan <bala...@eik.bme.hu> writes:
The low level extract and deposit funtions provided by bitops.h are
used in performance critical places. It crept into target/ppc via
FIELD_EX64 and also used by softfloat so PPC code using a lot of FPU
where hardfloat is also disabled is doubly affected.
Most of these asserts compile out to nothing if the compiler is able to
verify the constants are in the range. For example examining
the start of float64_add:
Dump of assembler code for function float64_add:
../../fpu/softfloat.c:
1979 {
0x00000000007ac9b0 <+0>: movabs $0xfffffffffffff,%r9
0x00000000007ac9ba <+10>: push %rbx
/home/alex/lsrc/qemu.git/include/qemu/bitops.h:
396 return (value >> start) & (~0ULL >> (64 - length));
0x00000000007ac9bb <+11>: mov %rdi,%rcx
0x00000000007ac9be <+14>: shr $0x34,%rcx
0x00000000007ac9c2 <+18>: and $0x7ff,%ecx
../../fpu/softfloat.c:
1979 {
0x00000000007ac9c8 <+24>: sub $0x30,%rsp
/home/alex/lsrc/qemu.git/include/qemu/bitops.h:
396 return (value >> start) & (~0ULL >> (64 - length));
0x00000000007ac9cc <+28>: mov %fs:0x28,%rax
0x00000000007ac9d5 <+37>: mov %rax,0x28(%rsp)
0x00000000007ac9da <+42>: mov %rdi,%rax
0x00000000007ac9dd <+45>: and %r9,%rdi
../../fpu/softfloat.c:
588 *r = (FloatParts64) {
0x00000000007ac9e0 <+48>: mov %ecx,0x4(%rsp)
0x00000000007ac9e4 <+52>: mov %rdi,0x8(%rsp)
/home/alex/lsrc/qemu.git/include/qemu/bitops.h:
396 return (value >> start) & (~0ULL >> (64 - length));
0x00000000007ac9e9 <+57>: shr $0x3f,%rax
../../fpu/softfloat.c:
588 *r = (FloatParts64) {
0x00000000007ac9ed <+61>: mov %al,0x1(%rsp)
589 .cls = float_class_unclassified,
590 .sign = extract64(raw, f_size + e_size, 1),
0x00000000007ac9f1 <+65>: mov %rax,%r8
I don't see any check and abort steps because all the shift and mask
values are known at compile time. The softfloat compilation certainly
does have some assert points though:
readelf -s ./libqemu-ppc64-softmmu.fa.p/fpu_softfloat.c.o |grep assert
136: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND g_assertion_mess[...]
138: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __assert_fail
but the references are for the ISRA segments so its tricky to know if
they get used or are just there for LTO purposes.
If there are hot-paths that show up the extract/deposit functions I
suspect a better approach would be to implement _nocheck variants (or
maybe _noassert?) and use them where required rather than turning off
the assert checking for these utility functions.
Just to clarify again, the asserts are still there when compiled with
--enable-debug. The patch only turns them off for optimised release builds
which I think makes sense if these asserts are to catch programming
errors. I think I've also suggested adding noassert versions of these but
that wasn't a popular idea and it may also not be easy to convert all
places to use that like for example the register fields related usage in
target/ppc as that would also affect other places. So this seems to be the
simplest and most effective approach.
The softfloat related usage in these tests I've done seem to mostly come
from unpacking and repacking floats in softfloat which is done for every
operation, e.g. muladd which mp3 encoding mostly uses does 3 unpacks and 1
pack for each call and each unpack is 3 extracts so even small overheads
add app quickly. Just 1 muladd will result in 9 extracts and 2 deposits at
least plus updating PPC flags for each FPU op adds a bunch more. I did
some profiling with perf to find these.
Regards,
BALATON Zoltan