Hi Florian. Thorsten and I got down to a fairly optimised version of Frac, in both speed and size:
**** function Frac(const X: ValReal): ValReal; assembler; nostackframe; asm movq rax, xmm0 shr rax, 48 and ax, $7FF0 cmp ax, $4330 jge @@zero cvttsd2si rax, xmm0 cvtsi2sd xmm4, rax subsd xmm0, xmm4 ret @@zero: xorpd xmm0, xmm0 end; **** It fits into just three 16-byte blocks and is the fastest overall from our tests, although there's a slight penalty if it jumps to @@zero that seems to be architecture-dependent (e.g. it slows down for me, but Thorsten didn't see much). Aligning @@zero to a 16-byte boundary may fix this for some, but it doesn't for me. Oh the joy of processor intracacies! Gareth aka. Kit. On Sun 29/04/18 19:28 , Florian Klaempfl flor...@freepascal.org sent: Am 28.04.2018 um 17:57 schrieb Thorsten Engler: >> -----Original Message----- >> From: fpc-devel On Behalf >> Of Florian Klämpfl >> So something like >> >> cmp edx, $43300000 >> jge @@zero >> cmp edx, $3FE00000 >> .align 16 >> jbe @@skip >> >> might be much better. > > That ended up making things worse in some cases. Can you take a look at the generated machine code if delphi uses proper multi byte nops. If not, the align might make things indeed worse. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [3]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: ------ [1] mailto:fpc-devel-boun...@lists.freepascal.org [2] mailto:fpc-devel@lists.freepascal.org [3] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel