Am 19.05.2012 00:35, schrieb Brian Paul: > On 05/18/2012 03:54 PM, Roland Scheidegger wrote: >> Looks ok though I wonder if we really need our own assembly here? >> In particular if the compiler decides to use sse we really shouldn't use >> the fp stack for converting floats to ints. fistp is just twice as slow >> as sse conversion on newer cpus, and additionally it might potentially >> involve moving values from xmm regs to fp. >> I suspect something like lroundf() would generate better code than the >> manual assembly (and far better than the c code) if things are compiled >> to use sse2 at least (the same is of course true for the other functions >> like ceil etc.). But I guess that's not available everywhere... > > For now, I'm just trying to fix the issue at hand. If anyone wants to > look into using lroundf() and SSE code, that's great. I'm really not up > on what's the fastest solution on various CPUs.
Actually, lroundf() is useless, gcc doesn't seem to have a builtin for it, and the library function can't rely on default rounding mode, it won't touch mxcsr and the resulting code is terrible. In fact looks worse than the c code to me. This is what I got (with gcc 4.6.2, -O2, x86_64: Dump of assembler code for function lroundf: => 0x00007ffff7baca60 <+0>: movd %xmm0,%edx 0x00007ffff7baca64 <+4>: mov %edx,%ecx 0x00007ffff7baca66 <+6>: mov %edx,%eax 0x00007ffff7baca68 <+8>: shr $0x17,%ecx 0x00007ffff7baca6b <+11>: sar $0x1f,%eax 0x00007ffff7baca6e <+14>: and $0xff,%ecx 0x00007ffff7baca74 <+20>: or $0x1,%eax 0x00007ffff7baca77 <+23>: lea -0x7f(%rcx),%edi 0x00007ffff7baca7a <+26>: cmp $0x3e,%edi 0x00007ffff7baca7d <+29>: jg 0x7ffff7bacaa8 <lroundf+72> 0x00007ffff7baca7f <+31>: test %edi,%edi 0x00007ffff7baca81 <+33>: js 0x7ffff7bacad0 <lroundf+112> 0x00007ffff7baca83 <+35>: and $0x7fffff,%edx 0x00007ffff7baca89 <+41>: or $0x800000,%edx 0x00007ffff7baca8f <+47>: cmp $0x16,%edi 0x00007ffff7baca92 <+50>: jle 0x7ffff7bacab0 <lroundf+80> 0x00007ffff7baca94 <+52>: sub $0x96,%ecx 0x00007ffff7baca9a <+58>: cltq 0x00007ffff7baca9c <+60>: shl %cl,%rdx 0x00007ffff7baca9f <+63>: imul %rdx,%rax 0x00007ffff7bacaa3 <+67>: retq 0x00007ffff7bacaa4 <+68>: nopl 0x0(%rax) 0x00007ffff7bacaa8 <+72>: cvttss2si %xmm0,%rax 0x00007ffff7bacaad <+77>: retq 0x00007ffff7bacaae <+78>: xchg %ax,%ax 0x00007ffff7bacab0 <+80>: mov %edi,%ecx 0x00007ffff7bacab2 <+82>: mov $0x400000,%esi 0x00007ffff7bacab7 <+87>: cltq 0x00007ffff7bacab9 <+89>: sar %cl,%esi 0x00007ffff7bacabb <+91>: mov $0x17,%ecx 0x00007ffff7bacac0 <+96>: add %esi,%edx 0x00007ffff7bacac2 <+98>: sub %edi,%ecx 0x00007ffff7bacac4 <+100>: shr %cl,%edx 0x00007ffff7bacac6 <+102>: imul %rdx,%rax 0x00007ffff7bacaca <+106>: retq 0x00007ffff7bacacb <+107>: nopl 0x0(%rax,%rax,1) 0x00007ffff7bacad0 <+112>: movslq %eax,%rdx 0x00007ffff7bacad3 <+115>: xor %eax,%eax 0x00007ffff7bacad5 <+117>: cmp $0xffffffff,%edi 0x00007ffff7bacad8 <+120>: cmove %rdx,%rax 0x00007ffff7bacadc <+124>: retq End of assembler dump. I think a single cvtss2si call (which btw is only sse not sse2) instead would be a order of magnitude faster than this mess (but of course would rely on default rounding mode)... Anyway, I guess if we don't care about rounding, we should probably just use the c truncation on x86_64 (and maybe all other cpus except x86 as I'd guess they'd have some way to do this fast?). On x86 though c truncation produces not so good code (messing with fpu control word to adjust rounding mode, looks like gcc assumes it's set to default rounding mode so I'm not sure why it actually was causing the failure in the first place), unless -msse is specified (-mfpmath=sse isn't actually required for just the conversion to happen with sse instruction) in which case it's just the same single cvttss2si instruction as on x86_64. Though surely in cases where a lot of floats (e.g. all values in a texture image) are converted adjusting the float control word to get correct rounding isn't an issue. x87 is such a mess... Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev