> From: fpc-devel <fpc-devel-boun...@lists.freepascal.org> On Behalf Of J. > Gareth Moreton > Sent: Sunday, 29 April 2018 12:36
> As an extra point, removing the 'skip' check (i.e. cmp ax, $3FE0, jbe @@skip) > removes 6 bytes from the code size and shaves about 2 to 3 nanoseconds off > the execution time in most cases, and it could be argued that it's worth > going for the 'no skip' version because using Frac on a value of x where > |x| < 1 is rather uncommon compared to when |x| >= 1. I agree that calling Frac on values that are already just a fraction is probably not going to happen too often. > However, when running my timing tests, one thing that's confused me > is that when using very large inputs like 10^300, the function is > at least 5 nanoseconds slower than FracSkip2, even though the code > is less complex. This happens even if I put 'align 16' before the @@zero > label. I do not see any noticeable difference between 1e16 and 1e300 as inputs: Code address: Frac1: 0000000000536440 (64) Frac2: 0000000000536490 (16) Frac3: 00000000005364E0 (96) Frac4: 0000000000536530 (48) Frac5: 0000000000536580 (0) Frac6: 00000000005365D0 (80) Frac7: 0000000000536620 (32) Frac8: 0000000000536670 (112) 1st run: In range (1e15+0.5): Frac1 923470 Frac2 964422 Frac3 967501 Frac4 1027080 Frac5 1005352 Frac6 1052105 Frac7 1011983 Frac8 1048743 Out of range (1e16+0.5): Frac1 893526 Frac2 998532 Frac3 894644 Frac4 993987 Frac5 895353 Frac6 994606 Frac7 900848 Frac8 992751 Out of range (1e300): Frac1 897274 Frac2 986679 Frac3 899123 Frac4 999495 Frac5 899438 Frac6 989588 Frac7 885060 Frac8 985288 Only fraction (0.5): Frac1 954220 Frac2 1046781 Frac3 993959 Frac4 1015032 Frac5 1013128 Frac6 1043157 Frac7 928712 Frac8 988220 Also, it seems to be relatively resilient against changes in code alignment even if it's not a multiple of 16: Code address: Frac1: 0000000000536433 (51) Frac2: 000000000053645D (93) Frac3: 0000000000536487 (7) Frac4: 00000000005364B1 (49) Frac5: 00000000005364DB (91) Frac6: 0000000000536505 (5) Frac7: 000000000053652F (47) Frac8: 0000000000536559 (89) 1st run: In range (1e15+0.5): Frac1 946247 Frac2 904187 Frac3 902870 Frac4 1025163 Frac5 931021 Frac6 895990 Frac7 1050683 Frac8 952305 Out of range (1e16+0.5): Frac1 883588 Frac2 877412 Frac3 809785 Frac4 831095 Frac5 976555 Frac6 711201 Frac7 791657 Frac8 897085 Out of range (1e300): Frac1 902103 Frac2 901861 Frac3 802404 Frac4 808002 Frac5 972999 Frac6 710888 Frac7 804050 Frac8 875901 Only fraction (0.5): Frac1 945212 Frac2 904468 Frac3 915325 Frac4 997584 Frac5 945569 Frac6 898036 Frac7 1071561 Frac8 906152 > Nevertheless, I conclude that for most situations, using the improved > FracNoSkip gives the best performance and size for typical inputs, > but this may depend on an individual machine's architecture. Seems we got a winner. I was considering the ret like that, but didn't do it as I was worried because SEH under windows expects function prologues and epilogues that exactly match a specific pattern. But in hindsight, this is a no stack frame leaf function anyway, so I don't think that matters. Cheers, Thorsten _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel